mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-24 02:19:18 +01:00

History

Georgi Gerganov 1442677f92 common : refactor cli arg parsing (#7675 ) * common : gpt_params_parse do not print usage * common : rework usage print (wip) * common : valign * common : rework print_usage * infill : remove cfg support * common : reorder args * server : deduplicate parameters ggml-ci * common : add missing header ggml-ci * common : remote --random-prompt usages ggml-ci * examples : migrate to gpt_params ggml-ci * batched-bench : migrate to gpt_params * retrieval : migrate to gpt_params * common : change defaults for escape and n_ctx * common : remove chatml and instruct params ggml-ci * common : passkey use gpt_params		2024-06-04 21:23:39 +03:00
..
CMakeLists.txt	examples : add compiler version and target to build info (#2998 )	2023-09-15 16:59:49 -04:00
README.md	common : refactor cli arg parsing (#7675 )	2024-06-04 21:23:39 +03:00
simple.cpp	common : refactor cli arg parsing (#7675 )	2024-06-04 21:23:39 +03:00

README.md

llama.cpp/example/simple

The purpose of this example is to demonstrate a minimal usage of llama.cpp for generating text with a given prompt.

./simple -m ./models/llama-7b-v2/ggml-model-f16.gguf -p "Hello my name is"

...

main: n_len = 32, n_ctx = 2048, n_parallel = 1, n_kv_req = 32

 Hello my name is Shawn and I'm a 20 year old male from the United States. I'm a 20 year old

main: decoded 27 tokens in 2.31 s, speed: 11.68 t/s

llama_print_timings:        load time =   579.15 ms
llama_print_timings:      sample time =     0.72 ms /    28 runs   (    0.03 ms per token, 38888.89 tokens per second)
llama_print_timings: prompt eval time =   655.63 ms /    10 tokens (   65.56 ms per token,    15.25 tokens per second)
llama_print_timings:        eval time =  2180.97 ms /    27 runs   (   80.78 ms per token,    12.38 tokens per second)
llama_print_timings:       total time =  2891.13 ms