From 51bf10326912b670a7ab486df9e69e56cfc796f7 Mon Sep 17 00:00:00 2001 From: John Smith Date: Sat, 22 Apr 2023 16:09:38 +0800 Subject: [PATCH] Update README.md --- README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.md b/README.md index e830cd0..dbbb6ea 100644 --- a/README.md +++ b/README.md @@ -43,6 +43,12 @@ It's fast on a 3070 Ti mobile. Uses 5-6 GB of GPU RAM. * Removed bitsandbytes from requirements * Added pip installable branch based on winglian's PR * Added cuda backend quant attention and fused mlp from GPTQ_For_Llama. +* Added lora patch for GPTQ_For_Llama triton backend. + +``` +from monkeypatch.gptq_for_llala_lora_monkey_patch import inject_lora_layers +inject_lora_layers(model, lora_path, device, dtype) +``` # Requirements gptq-for-llama