Update README.md

This commit is contained in:
John Smith 2023-04-07 16:10:36 +08:00 committed by GitHub
parent b01b10eb4d
commit f91d4cbb59
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 3 additions and 3 deletions

View File

@ -15,6 +15,8 @@ Made some adjust for the code in peft and gptq for llama, and make it possible f
* Added some options on finetune: set default to use eos_token instead of padding, add resume_checkpoint to continue training
* Added offload support. load_llama_model_4bit_low_ram_and_offload_to_cpu function can be used.
* Added monkey patch for text generation webui for fixing initial eos token issue.
* Added Flash attention support. (Use --flash-attention)
* Added Triton backend to support model using groupsize and act-order. (Use --backend=triton)
# Requirements
gptq-for-llama <br>
@ -82,6 +84,4 @@ python server.py
# Flash Attention
It seems that we can apply a monkey patch for llama model. To use it, simply download the file from [MonkeyPatch](https://github.com/lm-sys/FastChat/blob/daa9c11080ceced2bd52c3e0027e4f64b1512683/fastchat/train/llama_flash_attn_monkey_patch.py). And also, flash-attention is needed, and currently do not support pytorch 2.0.
```
pip install flash-attn
```
Just add --flash-attention to use it for finetuning.