Update README.md
This commit is contained in:
parent
33a76b00ca
commit
51bf103269
|
|
@ -43,6 +43,12 @@ It's fast on a 3070 Ti mobile. Uses 5-6 GB of GPU RAM.
|
|||
* Removed bitsandbytes from requirements
|
||||
* Added pip installable branch based on winglian's PR
|
||||
* Added cuda backend quant attention and fused mlp from GPTQ_For_Llama.
|
||||
* Added lora patch for GPTQ_For_Llama triton backend.
|
||||
|
||||
```
|
||||
from monkeypatch.gptq_for_llala_lora_monkey_patch import inject_lora_layers
|
||||
inject_lora_layers(model, lora_path, device, dtype)
|
||||
```
|
||||
|
||||
# Requirements
|
||||
gptq-for-llama <br>
|
||||
|
|
|
|||
Loading…
Reference in New Issue