mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2025-01-10 12:30:50 +01:00
65e5f6dadb
The default values for tfs_z and typical_p were being set to zero, which caused the token candidates array to get shrunk down to one element thus preventing any sampling. Note this only applies to OpenAI API compatible HTTP server requests. The solution is to use the default values that OpenAI documents, as well as ensuring we use the llama.cpp defaults for the rest. I've tested this change still ensures deterministic output by default. If a "temperature" greater than 0 is explicitly passed, then output is unique each time. If "seed" is specified in addition to "temperature" then the output becomes deterministic once more. See mozilla-Ocho/llamafile#117 See mozilla-Ocho/llamafile@9e4bf29