text-generation-webui/llama.cpp-models.md at 280c2f285f9c1c74f1204f596813f53ac00a0d99

mirror of https://github.com/oobabooga/text-generation-webui.git synced 2024-12-25 05:48:55 +01:00

oobabooga fcb594b90e Don't require llama.cpp models to be placed in subfolders

2023-04-22 14:56:48 -03:00

Using llama.cpp in the web UI

Simply place the model in the models folder, making sure that its name contains ggml somewhere and ends in .bin.

Follow the instructions in the llama.cpp README to generate the ggml-model-q4_0.bin file: https://github.com/ggerganov/llama.cpp#usage

This was the performance of llama-7b int4 on my i5-12400F:

Output generated in 33.07 seconds (6.05 tokens/s, 200 tokens, context 17)

You can change the number of threads with --threads N.