mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-10 12:30:50 +01:00

History

Georgi Gerganov 0abc6a2c25

llama : llama_perf + option to disable timings during decode (#9355 )

* llama : llama_perf + option to disable timings during decode

ggml-ci

* common : add llama_arg

* Update src/llama.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* perf : separate functions in the API

ggml-ci

* perf : safer pointer handling + naming update

ggml-ci

* minor : better local var name

* perf : abort on invalid sampler pointer

ggml-ci

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

2024-09-13 09:53:38 +03:00

CMakeLists.txt

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

lookup-create.cpp

common : move arg parser code to arg.cpp (#9388 )

2024-09-09 23:36:09 +02:00

lookup-merge.cpp

build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )

2024-06-13 00:41:52 +01:00

lookup-stats.cpp

common : move arg parser code to arg.cpp (#9388 )

2024-09-09 23:36:09 +02:00

lookup.cpp

llama : llama_perf + option to disable timings during decode (#9355 )

2024-09-13 09:53:38 +03:00

README.md

Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258 )

2024-07-02 12:18:10 -04:00

README.md

llama.cpp/examples/lookup

Demonstration of Prompt Lookup Decoding

https://github.com/apoorvumang/prompt-lookup-decoding

The key parameters for lookup decoding are ngram_min, ngram_max and n_draft. The first two determine the size of the ngrams to search for in the prompt for a match. The latter specifies how many subsequent tokens to draft if a match is found.

More info:

https://github.com/ggerganov/llama.cpp/pull/4484 https://github.com/ggerganov/llama.cpp/issues/4226