mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2025-01-25 02:48:44 +01:00
1c641e6aac
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew
* server: update refs -> llama-server
gitignore llama-server
* server: simplify nix package
* main: update refs -> llama
fix examples/main ref
* main/server: fix targets
* update more names
* Update build.yml
* rm accidentally checked in bins
* update straggling refs
* Update .gitignore
* Update server-llm.sh
* main: target name -> llama-cli
* Prefix all example bins w/ llama-
* fix main refs
* rename {main->llama}-cmake-pkg binary
* prefix more cmake targets w/ llama-
* add/fix gbnf-validator subfolder to cmake
* sort cmake example subdirs
* rm bin files
* fix llama-lookup-* Makefile rules
* gitignore /llama-*
* rename Dockerfiles
* rename llama|main -> llama-cli; consistent RPM bin prefixes
* fix some missing -cli suffixes
* rename dockerfile w/ llama-cli
* rename(make): llama-baby-llama
* update dockerfile refs
* more llama-cli(.exe)
* fix test-eval-callback
* rename: llama-cli-cmake-pkg(.exe)
* address gbnf-validator unused fread warning (switched to C++ / ifstream)
* add two missing llama- prefixes
* Updating docs for eval-callback binary to use new `llama-` prefix.
* Updating a few lingering doc references for rename of main to llama-cli
* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.
* Updating documentation references for lookup-merge and export-lora
* Updating two small `main` references missed earlier in the finetune docs.
* Update apps.nix
* update grammar/README.md w/ new llama-* names
* update llama-rpc-server bin name + doc
* Revert "update llama-rpc-server bin name + doc"
This reverts commit e474ef1df4
.
* add hot topic notice to README.md
* Update README.md
* Update README.md
* rename gguf-split & quantize bins refs in **/tests.sh
---------
Co-authored-by: HanClinto <hanclinto@gmail.com>
91 lines
4.5 KiB
Markdown
91 lines
4.5 KiB
Markdown
# finetune
|
|
|
|
Basic usage instructions:
|
|
|
|
```bash
|
|
# get training data
|
|
wget https://raw.githubusercontent.com/brunoklein99/deep-learning-notes/master/shakespeare.txt
|
|
|
|
# finetune LORA adapter
|
|
./bin/llama-finetune \
|
|
--model-base open-llama-3b-v2-q8_0.gguf \
|
|
--checkpoint-in chk-lora-open-llama-3b-v2-q8_0-shakespeare-LATEST.gguf \
|
|
--checkpoint-out chk-lora-open-llama-3b-v2-q8_0-shakespeare-ITERATION.gguf \
|
|
--lora-out lora-open-llama-3b-v2-q8_0-shakespeare-ITERATION.bin \
|
|
--train-data "shakespeare.txt" \
|
|
--save-every 10 \
|
|
--threads 6 --adam-iter 30 --batch 4 --ctx 64 \
|
|
--use-checkpointing
|
|
|
|
# predict
|
|
./bin/llama-cli -m open-llama-3b-v2-q8_0.gguf --lora lora-open-llama-3b-v2-q8_0-shakespeare-LATEST.bin
|
|
```
|
|
|
|
**Only llama based models are supported!** The output files will be saved every N iterations (config with `--save-every N`).
|
|
The pattern 'ITERATION' in the output filenames will be replaced with the iteration number and with 'LATEST' for the latest output.
|
|
So in above example after 10 iterations these files will be written:
|
|
- chk-lora-open-llama-3b-v2-q8_0-shakespeare-10.gguf
|
|
- chk-lora-open-llama-3b-v2-q8_0-shakespeare-LATEST.gguf
|
|
- lora-open-llama-3b-v2-q8_0-shakespeare-10.bin
|
|
- lora-open-llama-3b-v2-q8_0-shakespeare-LATEST.bin
|
|
|
|
After 10 more iterations:
|
|
- chk-lora-open-llama-3b-v2-q8_0-shakespeare-20.gguf
|
|
- chk-lora-open-llama-3b-v2-q8_0-shakespeare-LATEST.gguf
|
|
- lora-open-llama-3b-v2-q8_0-shakespeare-20.bin
|
|
- lora-open-llama-3b-v2-q8_0-shakespeare-LATEST.bin
|
|
|
|
Checkpoint files (`--checkpoint-in FN`, `--checkpoint-out FN`) store the training process. When the input checkpoint file does not exist, it will begin finetuning a new randomly initialized adapter.
|
|
|
|
llama.cpp compatible LORA adapters will be saved with filename specified by `--lora-out FN`.
|
|
These LORA adapters can then be used by `llama-cli` together with the base model, like in the 'predict' example command above.
|
|
|
|
In `llama-cli` you can also load multiple LORA adapters, which will then be mixed together.
|
|
|
|
For example if you have two LORA adapters `lora-open-llama-3b-v2-q8_0-shakespeare-LATEST.bin` and `lora-open-llama-3b-v2-q8_0-bible-LATEST.bin`, you can mix them together like this:
|
|
|
|
```bash
|
|
./bin/llama-cli -m open-llama-3b-v2-q8_0.gguf \
|
|
--lora lora-open-llama-3b-v2-q8_0-shakespeare-LATEST.bin \
|
|
--lora lora-open-llama-3b-v2-q8_0-bible-LATEST.bin
|
|
```
|
|
|
|
You can change how strong each LORA adapter is applied to the base model by using `--lora-scaled FN SCALE` instead of `--lora FN`.
|
|
|
|
For example to apply 40% of the 'shakespeare' LORA adapter, 80% of the 'bible' LORA adapter and 100% of yet another one:
|
|
|
|
```bash
|
|
./bin/llama-cli -m open-llama-3b-v2-q8_0.gguf \
|
|
--lora-scaled lora-open-llama-3b-v2-q8_0-shakespeare-LATEST.bin 0.4 \
|
|
--lora-scaled lora-open-llama-3b-v2-q8_0-bible-LATEST.bin 0.8 \
|
|
--lora lora-open-llama-3b-v2-q8_0-yet-another-one-LATEST.bin
|
|
```
|
|
|
|
The scale numbers don't need to add up to one, and you can also use numbers greater than 1 to further increase the influence of an adapter. But making the values too big will sometimes result in worse output. Play around to find good values.
|
|
|
|
Gradient checkpointing reduces the memory requirements by ~50% but increases the runtime.
|
|
If you have enough RAM, you can make finetuning a bit faster by disabling checkpointing with `--no-checkpointing`.
|
|
|
|
The default LORA rank can be specified with `--lora-r N`.
|
|
The LORA rank can be configured for each model tensor type separately with these command line options:
|
|
|
|
```bash
|
|
--lora-r N LORA r: default rank. Also specifies resulting scaling together with lora-alpha. (default 4)
|
|
--rank-att-norm N LORA rank for attention norm tensor (default 1)
|
|
--rank-ffn-norm N LORA rank for feed-forward norm tensor (default 1)
|
|
--rank-out-norm N LORA rank for output norm tensor (default 1)
|
|
--rank-tok-embd N LORA rank for token embeddings tensor (default 4)
|
|
--rank-out N LORA rank for output tensor (default 4)
|
|
--rank-wq N LORA rank for wq tensor (default 4)
|
|
--rank-wk N LORA rank for wk tensor (default 4)
|
|
--rank-wv N LORA rank for wv tensor (default 4)
|
|
--rank-wo N LORA rank for wo tensor (default 4)
|
|
--rank-ffn_gate N LORA rank for ffn_gate tensor (default 4)
|
|
--rank-ffn_down N LORA rank for ffn_down tensor (default 4)
|
|
--rank-ffn_up N LORA rank for ffn_up tensor (default 4)
|
|
```
|
|
|
|
The LORA rank of 'norm' tensors should always be 1.
|
|
|
|
To see all available options use `finetune --help`.
|