Fork of https://github.com/johnsmith0031/alpaca_lora_4bit

Go to file

Star Dorminey 17f3da744c Fix repos.		2023-03-25 20:16:48 -07:00
repos	Add GPTQ and peft as submodules.	2023-03-25 20:02:19 -07:00
text-generation-webui	add monkey patch for webui	2023-03-22 07:58:51 +00:00
.gitignore	Fix repos.	2023-03-25 20:16:48 -07:00
.gitmodules	Fix repos.	2023-03-25 20:16:48 -07:00
Finetune4bConfig.py	distributed data parallelism with torchrun	2023-03-24 23:56:06 -05:00
LICENSE	Create LICENSE	2023-03-25 10:17:44 +08:00
README.md	Update README.md	2023-03-23 16:31:49 +08:00
arg_parser.py	Reflect last changes in main	2023-03-24 15:46:03 +03:00
autograd_4bit.py	Fix repos.	2023-03-25 20:16:48 -07:00
data.txt	add data	2023-03-22 12:13:34 +08:00
finetune.py	distributed data parallelism with torchrun	2023-03-24 23:56:06 -05:00
gradient_checkpointing.py	Fix repos.	2023-03-25 20:16:48 -07:00
inference.py	add more scripts and adjust code for transformer branch	2023-03-22 04:09:04 +00:00
install.bat	fix minor bug	2023-03-23 08:43:18 +00:00
install.sh	fix minor bug	2023-03-23 08:43:18 +00:00
requirements.txt	add more scripts and adjust code for transformer branch	2023-03-22 04:09:04 +00:00
train_data.py	Refactor finetune.py	2023-03-24 14:15:07 +03:00

README.md

Alpaca Lora 4bit

Made some adjust for the code in peft and gptq for llama, and make it possible for lora finetuning with a 4 bits base model. The same adjustment can be made for 2, 3 and 8 bits.

Install Manual by s4rduk4r: https://github.com/s4rduk4r/alpaca_lora_4bit_readme/blob/main/README.md

Update Logs

Resolved numerically unstable issue
Reconstruct fp16 matrix from 4bit data and call torch.matmul largely increased the inference speed.
Added install script for windows and linux.
Added Gradient Checkpointing. Now It can finetune 30b model 4bit on a single GPU with 24G VRAM with Gradient Checkpointing enabled. (finetune.py updated) (but would reduce training speed, so if having enough VRAM this option is not needed)
Added install manual by s4rduk4r

Requirements

gptq-for-llama: https://github.com/qwopqwop200/GPTQ-for-LLaMa
peft: https://github.com/huggingface/peft.git

Install

~copy files from GPTQ-for-LLaMa into GPTQ-for-LLaMa path and re-compile cuda extension~
~copy files from peft/tuners/lora.py to peft path, replace it~

Linux:

./install.sh

Windows:

./install.bat

Finetune

~The same finetune script from https://github.com/tloen/alpaca-lora can be used.~

After installation, this script can be used:

python finetune.py

Inference

After installation, this script can be used:

python inference.py

Text Generation Webui Monkey Patch

Clone the latest version of text generation webui and copy all the files into ./text-generation-webui/

git clone https://github.com/oobabooga/text-generation-webui.git

Open server.py and insert a line at the beginning

import custom_monkey_patch # apply monkey patch
import gc
import io
...

Use the command to run

python server.py