Fork of https://github.com/johnsmith0031/alpaca_lora_4bit

Go to file

Andy Barry 417eba372a Add dockerfile and change some numbers to use 7bn model.		2023-04-05 23:13:35 -04:00
text-generation-webui	Add dockerfile and change some numbers to use 7bn model.	2023-04-05 23:13:35 -04:00
.gitignore	Fix repos.	2023-03-25 20:16:48 -07:00
Dockerfile	Add dockerfile and change some numbers to use 7bn model.	2023-04-05 23:13:35 -04:00
Finetune4bConfig.py	better multi-gpu support, support gpt4all training data	2023-03-29 11:21:47 -04:00
LICENSE	Create LICENSE	2023-03-25 10:17:44 +08:00
README.md	Add dockerfile and change some numbers to use 7bn model.	2023-04-05 23:13:35 -04:00
amp_wrapper.py	add amp_wrapper for autocast support.	2023-03-30 19:57:19 +08:00
arg_parser.py	add g_idx buffer.\nadd triton matmul utils for future support.	2023-04-02 21:29:06 +08:00
autograd_4bit.py	add g_idx buffer.\nadd triton matmul utils for future support.	2023-04-02 21:29:06 +08:00
data.txt	add data	2023-03-22 12:13:34 +08:00
finetune.py	update multi gpu support in finetune.py	2023-04-03 23:55:58 +08:00
gradient_checkpointing.py	Fix repos.	2023-03-25 20:16:48 -07:00
inference.py	add amp_wrapper for autocast support.	2023-03-30 19:57:19 +08:00
matmul_utils_4bit.py	fix gpt4all training to more closely match the released logic, other small fixes and optimizations	2023-03-30 22:40:40 -04:00
requirements.txt	Add dockerfile and change some numbers to use 7bn model.	2023-04-05 23:13:35 -04:00
requirements2.txt	Add dockerfile and change some numbers to use 7bn model.	2023-04-05 23:13:35 -04:00
train_data.py	fix gpt4all training to more closely match the released logic, other small fixes and optimizations	2023-03-30 22:40:40 -04:00
triton_utils.py	add g_idx buffer.\nadd triton matmul utils for future support.	2023-04-02 21:29:06 +08:00

README.md

Run LLM chat in realtime on an 8GB NVIDIA GPU

Dockerfile for alpaca_lora_4bit

Based on https://github.com/johnsmith0031/alpaca_lora_4bit

Use

Can run real-time LLM chat using alpaca on a 8GB NVIDIA/CUDA GPU (ie 3070 Ti mobile)

Requirements

linux with docker
nvidia GPU

Installation

docker build -t alpaca_lora_4bit .
docker run -p 7086:7086 alpaca_lora_4bit

Point your browser to http://localhost:7086

Results

It's fast on a 3070 Ti.

Discussion

The model isn't all that good, sometimes it goes crazy. But hey, "when 4-bits you reach look this good you will not."

But it is fast (on my 3070 Ti mobile at least)