llama.cpp/examples at b1710 - llama.cpp - Gitea: Git with a cup of tea

Mirrors/llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-10 12:30:50 +01:00

History

Justine Tunney 65e5f6dadb

Fix OpenAI server sampling w.r.t. temp and seed (#4668 )

The default values for tfs_z and typical_p were being set to zero, which
caused the token candidates array to get shrunk down to one element thus
preventing any sampling. Note this only applies to OpenAI API compatible
HTTP server requests.

The solution is to use the default values that OpenAI documents, as well
as ensuring we use the llama.cpp defaults for the rest. I've tested this
change still ensures deterministic output by default. If a "temperature"
greater than 0 is explicitly passed, then output is unique each time. If
"seed" is specified in addition to "temperature" then the output becomes
deterministic once more.

See mozilla-Ocho/llamafile#117
See mozilla-Ocho/llamafile@9e4bf29

2023-12-28 15:20:00 -04:00

..

ggml : change ggml_scale to take a float instead of tensor (#4573 )

2023-12-21 23:20:49 +02:00

cuda : add batched cuBLAS GEMM for faster attention (#3749 )

2023-10-24 16:48:37 +03:00

ggml : add ggml_soft_max_ext (#4256 )

2023-12-01 10:51:24 +02:00

swift : fix prompt tokenization logic (#4321 )

2023-12-04 15:43:45 +02:00

llama : remove token functions with context args in favor of model (#3720 )

2023-10-23 22:40:03 +03:00

ggml : add ggml_row_size() (fixes llama out of space) (#4461 )

2023-12-14 14:13:33 +02:00

convert-llama2c-to-ggml

ggml : remove n_dims from ggml_tensor (#4469 )

2023-12-14 16:52:08 +01:00

build : link against build info instead of compiling against it (#3879 )

2023-11-02 08:50:16 +02:00

ggml : change ggml_scale to take a float instead of tensor (#4573 )

2023-12-21 23:20:49 +02:00

finetune : fix output formatting in print_params (#4653 )

2023-12-27 16:16:55 +02:00

gguf : simplify example dependencies

2023-12-21 23:08:14 +02:00

main : Add ChatML functionality to main example (#4046 )

2023-11-20 14:56:59 +01:00

parallel : add option to load external prompt file (#3416 )

2023-10-06 16:16:38 +03:00

llama : per-layer KV cache + quantum K cache (#4309 )

2023-12-07 13:03:17 +02:00

llama.swiftui : add tinyllama 1.1B F16

2023-12-18 20:17:43 +02:00

ggml : change ggml_scale to take a float instead of tensor (#4573 )

2023-12-21 23:20:49 +02:00

english : use typos to fix comments and logs (#4354 )

2023-12-12 11:53:36 +02:00

lookup : add prompt lookup decoding example (#4484 )

2023-12-22 18:05:56 +02:00

sampling : custom samplers order (#4285 )

2023-12-05 12:05:51 +02:00

cmake : add missed dependencies (#3763 )

2023-10-24 20:48:45 +03:00

sync : ggml (backend v2) (#3912 )

2023-11-13 14:16:23 +02:00

llama : KV cache view API + better KV cache management (#4170 )

2023-11-23 19:07:56 +02:00

Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040 )

2023-11-16 19:14:37 -07:00

build : link against build info instead of compiling against it (#3879 )

2023-11-02 08:50:16 +02:00

llama : per-layer KV cache + quantum K cache (#4309 )

2023-12-07 13:03:17 +02:00

save-load-state

build : link against build info instead of compiling against it (#3879 )

2023-11-02 08:50:16 +02:00

Fix OpenAI server sampling w.r.t. temp and seed (#4668 )

2023-12-28 15:20:00 -04:00

simple : update error message for KV cache check (#4324 )

2023-12-04 18:04:21 +02:00

english : use typos to fix comments and logs (#4354 )

2023-12-12 11:53:36 +02:00

tokenize example: Respect normal add BOS token behavior (#4126 )

2023-11-18 14:48:17 -07:00

train-text-from-scratch

ggml : change ggml_scale to take a float instead of tensor (#4573 )

2023-12-21 23:20:49 +02:00

alpaca.sh

alpaca.sh : update model file name (#2074 )

2023-07-06 19:17:50 +03:00

chat-13B.bat

Create chat-13B.bat (#592 )

2023-03-29 20:21:09 +03:00

chat-13B.sh

examples : read chat prompts from a template file (#1196 )

2023-05-03 20:58:11 +03:00

chat-persistent.sh

llama : fix session saving/loading (#3400 )

2023-10-03 21:04:01 +03:00

chat-vicuna.sh

examples : add chat-vicuna.sh (#1854 )

2023-06-15 21:05:53 +03:00

chat.sh

main : log file (#2748 )

2023-08-30 09:29:32 +03:00

CMakeLists.txt

lookup : add prompt lookup decoding example (#4484 )

2023-12-22 18:05:56 +02:00

gpt4all.sh

examples : add -n to alpaca and gpt4all scripts (#706 )

2023-04-13 16:03:39 +03:00

json-schema-to-grammar.py

chmod : make scripts executable (#2675 )

2023-08-23 17:29:09 +03:00

llama2-13b.sh

gitignore : changes for Poetry users + chat examples (#2284 )

2023-07-21 13:53:27 +03:00

llama2.sh

gitignore : changes for Poetry users + chat examples (#2284 )

2023-07-21 13:53:27 +03:00

llama.vim

vim : streaming and more (#2495 )

2023-08-08 14:44:48 +03:00

llm.vim

llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879 )

2023-08-30 09:50:55 +03:00

make-ggml.py

make-ggml.py : compatibility with more models and GGUF (#3290 )

2023-09-27 19:25:12 +03:00

Miku.sh

MIKU MAYHEM: Upgrading the Default Model for Maximum Fun 🎉 (#2287 )

2023-07-21 11:13:18 +03:00

reason-act.sh

chmod : make scripts executable (#2675 )

2023-08-23 17:29:09 +03:00

server-llama2-13B.sh

chmod : make scripts executable (#2675 )

2023-08-23 17:29:09 +03:00