mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2025-01-10 12:30:50 +01:00
cda0e4b648
* refactor llama_batch_get_one * adapt all examples * fix simple.cpp * fix llama_bench * fix * fix context shifting * free batch before return * use common_batch_add, reuse llama_batch in loop * null terminated seq_id list * fix save-load-state example * fix perplexity * correct token pos in llama_batch_allocr
llama.cpp/examples/lookup
Demonstration of Prompt Lookup Decoding
https://github.com/apoorvumang/prompt-lookup-decoding
The key parameters for lookup decoding are ngram_min
, ngram_max
and n_draft
. The first two determine the size of the ngrams to search for in the prompt for a match. The latter specifies how many subsequent tokens to draft if a match is found.
More info:
https://github.com/ggerganov/llama.cpp/pull/4484 https://github.com/ggerganov/llama.cpp/issues/4226