Update README.md
This commit is contained in:
parent
a0a0962de7
commit
1a0c63edaf
55
README.md
55
README.md
|
|
@ -1,5 +1,9 @@
|
||||||
# Alpaca Lora 4bit
|
# Alpaca Lora 4bit
|
||||||
Made some adjust for the code in peft and gptq for llama, and make it possible for lora finetuning with a 4 bits base model. The same adjustment can be made for 2, 3 and 8 bits.
|
Made some adjust for the code in peft and gptq for llama, and make it possible for lora finetuning with a 4 bits base model. The same adjustment can be made for 2, 3 and 8 bits.
|
||||||
|
* For those who want to use pip installable version:
|
||||||
|
```
|
||||||
|
pip install git+https://github.com/johnsmith0031/alpaca_lora_4bit@winglian-setup_pip
|
||||||
|
```
|
||||||
|
|
||||||
## Quick start for running the chat UI
|
## Quick start for running the chat UI
|
||||||
|
|
||||||
|
|
@ -19,10 +23,6 @@ It's fast on a 3070 Ti mobile. Uses 5-6 GB of GPU RAM.
|
||||||
# Development
|
# Development
|
||||||
* Install Manual by s4rduk4r: https://github.com/s4rduk4r/alpaca_lora_4bit_readme/blob/main/README.md
|
* Install Manual by s4rduk4r: https://github.com/s4rduk4r/alpaca_lora_4bit_readme/blob/main/README.md
|
||||||
* Also Remember to create a venv if you do not want the packages be overwritten.
|
* Also Remember to create a venv if you do not want the packages be overwritten.
|
||||||
* For those who want to use pip installable version:
|
|
||||||
```
|
|
||||||
pip install git+https://github.com/johnsmith0031/alpaca_lora_4bit@winglian-setup_pip
|
|
||||||
```
|
|
||||||
|
|
||||||
# Update Logs
|
# Update Logs
|
||||||
* Resolved numerically unstable issue
|
* Resolved numerically unstable issue
|
||||||
|
|
@ -49,32 +49,37 @@ peft<br>
|
||||||
The specific version is inside requirements.txt<br>
|
The specific version is inside requirements.txt<br>
|
||||||
|
|
||||||
# Install
|
# Install
|
||||||
~copy files from GPTQ-for-LLaMa into GPTQ-for-LLaMa path and re-compile cuda extension~<br>
|
|
||||||
~copy files from peft/tuners/lora.py to peft path, replace it~<br>
|
|
||||||
|
|
||||||
**NOTE:** Install scripts are no longer needed! requirements.txt now pulls from forks with the necessary patches.
|
|
||||||
|
|
||||||
```
|
```
|
||||||
pip install -r requirements.txt
|
pip install -r requirements.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
# Finetune
|
# Finetune
|
||||||
~The same finetune script from https://github.com/tloen/alpaca-lora can be used.~<br>
|
|
||||||
|
|
||||||
After installation, this script can be used:
|
After installation, this script can be used. Use --v1 flag for v1 model.
|
||||||
GPTQv1:
|
|
||||||
|
|
||||||
```
|
```
|
||||||
python finetune.py
|
python finetune.py ./data.txt \
|
||||||
```
|
--ds_type=txt \
|
||||||
or
|
--lora_out_dir=./test/ \
|
||||||
```
|
--llama_q4_config_dir=./llama-7b-4bit/ \
|
||||||
GPTQ_VERSION=1 python finetune.py
|
--llama_q4_model=./llama-7b-4bit.pt \
|
||||||
```
|
--mbatch_size=1 \
|
||||||
|
--batch_size=2 \
|
||||||
GPTQv2:
|
--epochs=3 \
|
||||||
```
|
--lr=3e-4 \
|
||||||
GPTQ_VERSION=2 python finetune.py
|
--cutoff_len=256 \
|
||||||
|
--lora_r=8 \
|
||||||
|
--lora_alpha=16 \
|
||||||
|
--lora_dropout=0.05 \
|
||||||
|
--warmup_steps=5 \
|
||||||
|
--save_steps=50 \
|
||||||
|
--save_total_limit=3 \
|
||||||
|
--logging_steps=5 \
|
||||||
|
--groupsize=-1 \
|
||||||
|
--v1 \
|
||||||
|
--xformers \
|
||||||
|
--backend=cuda
|
||||||
```
|
```
|
||||||
|
|
||||||
# Inference
|
# Inference
|
||||||
|
|
@ -95,8 +100,6 @@ git clone https://github.com/oobabooga/text-generation-webui.git
|
||||||
Open server.py and insert a line at the beginning
|
Open server.py and insert a line at the beginning
|
||||||
```
|
```
|
||||||
import custom_monkey_patch # apply monkey patch
|
import custom_monkey_patch # apply monkey patch
|
||||||
import gc
|
|
||||||
import io
|
|
||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
@ -106,6 +109,12 @@ Use the command to run
|
||||||
python server.py
|
python server.py
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## monkey patch inside webui
|
||||||
|
|
||||||
|
Currently the webui support using this repo by the monkeypatch inside it.<br>
|
||||||
|
You can simply clone this repo to ./repositories/ in the path of text generation webui.
|
||||||
|
|
||||||
|
|
||||||
# Flash Attention
|
# Flash Attention
|
||||||
|
|
||||||
It seems that we can apply a monkey patch for llama model. To use it, simply download the file from [MonkeyPatch](https://github.com/lm-sys/FastChat/blob/daa9c11080ceced2bd52c3e0027e4f64b1512683/fastchat/train/llama_flash_attn_monkey_patch.py). And also, flash-attention is needed, and currently do not support pytorch 2.0.
|
It seems that we can apply a monkey patch for llama model. To use it, simply download the file from [MonkeyPatch](https://github.com/lm-sys/FastChat/blob/daa9c11080ceced2bd52c3e0027e4f64b1512683/fastchat/train/llama_flash_attn_monkey_patch.py). And also, flash-attention is needed, and currently do not support pytorch 2.0.
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue