GPTQv2 support. 1. Adds dependency on `triton` 2. Refactors autograd_4bit to include both GPTQv1 and GPTQv2 3. Introduces new environment variable GPTQ_VERSION to select autograd_4bit version 4. Fixes triton kernels 5. Matrix multiplications are in fp16 |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| autograd_4bit_v1.py | ||
| autograd_4bit_v2.py | ||