llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-10-30 14:40:16 +01:00

Author	SHA1	Message	Date
slaren	50fae10d03	Add --ignore-eos parameter (#181 ) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-19 20:22:48 +02:00
Erik Scholz	0b366e7357	Command line switch to use F16 for memory_k and memory_v (refactor of #154 ) (#294 ) * Use F16 for memory_k and memory_v * add command line switch to use f16 instead of f32 for memory k+v --------- Co-authored-by: Ty Everett <ty@tyweb.us>	2023-03-19 19:57:00 +02:00
Georgi Gerganov	70f01cb863	Drop trailing new line from file prompts (#80 )	2023-03-19 19:05:04 +02:00
Georgi Gerganov	9e1707218a	Add "--instruct" argument for usage with Alpaca (#240 ) Also start adding prompts in "./prompts"	2023-03-19 18:37:02 +02:00
Gary Linscott	a81d0c2a17	Fix n^2 loop in tokenization (#254 ) This causes long prompts to parse very slowly.	2023-03-18 11:17:19 +00:00
thement	c9f670a177	Implement non-greedy tokenizer that tries to maximize token lengths (#242 ) * Implement non-greedy tokenizer that tries to maximize token lengths * Insert single space in front of the prompt - this is to match original llama tokenizer behavior --------- Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>	2023-03-17 21:05:58 +01:00
Stephan Walter	367946c668	Don't tell users to use a bad number of threads (#243 ) The readme tells people to use the command line option "-t 8", causing 8 threads to be started. On systems with fewer than 8 cores, this causes a significant slowdown. Remove the option from the example command lines and use /proc/cpuinfo on Linux to determine a sensible default.	2023-03-17 19:47:35 +02:00
Matvey Soloviev	904d2a8d6a	Q4_1 quantization (#193 ) * Add AVX2 version of ggml_vec_dot_q4_1 * Small optimisations to q4_1 dot product (@Const-me) * Rearrange Q4_1 quantization to work for multipart models. (Fix #152) * Fix ggml_vec_mad_q4_1 too * Fix non-vectorised q4_1 vec mul	2023-03-17 06:48:39 +02:00
Justin Suess	2d64715ad4	added ctx_size parameter (#148 ) * added ctx_size parameter * added it in more places * Apply suggestions from code review --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-15 21:42:40 +02:00
Thomas Klausner	41be0a3b3d	Add NetBSD support. (#90 )	2023-03-13 18:40:54 +02:00
Matvey Soloviev	96ea727f47	Add interactive mode (#61 ) * Initial work on interactive mode. * Improve interactive mode. Make rev. prompt optional. * Update README to explain interactive mode. * Fix OS X build	2023-03-12 23:13:28 +02:00
Ben Garney	f385f8dee8	Allow using prompt files (#59 )	2023-03-12 22:28:36 +02:00
beiller	02f0c6fe7f	Add back top_k (#56 ) * Add back top_k * Update utils.cpp * Update utils.h --------- Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-12 22:23:15 +02:00
Sebastián A	eb062bb012	Windows fixes (#31 ) * Apply fixes suggested to build on windows Issue: https://github.com/ggerganov/llama.cpp/issues/22 * Remove unsupported VLAs * MSVC: Remove features that are only available on MSVC C++20. * Fix zero initialization of the other fields. * Change the use of vector for stack allocations.	2023-03-12 22:15:00 +02:00
beiller	129c7d1ea8	Add repetition penalty (#20 ) * Adding repeat penalization * Update utils.h * Update utils.cpp * Numeric fix Should probably still scale by temp even if penalized * Update comments, more proper application I see that numbers can go negative so a fix from a referenced commit * Minor formatting --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-12 11:27:42 +02:00
Georgi Gerganov	007a8f6f45	Support all LLaMA models + change Q4_0 quantization storage	2023-03-11 11:28:30 +02:00
Jean-Michaël Celerier	9dcf4dba45	Add missing headers for memcpy and assert (#3 )	2023-03-11 01:04:06 +02:00
Georgi Gerganov	70bc0b8b15	Fix a bug in the rope calculation	2023-03-10 23:46:57 +02:00
Georgi Gerganov	319cdb3e1f	Final touches	2023-03-10 21:50:46 +02:00
Georgi Gerganov	26c0846629	Initial release	2023-03-10 20:56:40 +02:00

20 Commits