diff --git a/README.md b/README.md index 898939a..213fa97 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,10 @@ pip install git+https://github.com/johnsmith0031/alpaca_lora_4bit@winglian-setup Better inference performance with text_generation_webui, about 40% faster +Simple expriment results:
+7b model with groupsize=128 no act-order
+improved from 13 tokens/sec to 20 tokens/sec + Step: 1. run model server process 2. run webui process with monkey patch