llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-25 22:08:46 +01:00

Author	SHA1	Message	Date
jp-x-g	f732695cd5	Clarify console output in convert-pth-to-ggml.py (#512 ) "Processing part 1 of 3" instead of "Processing part 0"	2023-03-25 23:53:55 +02:00
Georgi Gerganov	f5a77a629b	Introduce C-style API (#370 ) * Major refactoring - introduce C-style API * Clean up * Add <cassert> * Add <iterator> * Add <algorithm> .... * Fix timing reporting and accumulation * Measure eval time only for single-token calls * Change llama_tokenize return meaning	2023-03-22 07:32:36 +02:00
Georgi Gerganov	3bfa3b43b7	Fix convert script, warnings alpaca instructions, default params	2023-03-21 17:59:16 +02:00
Mack Straight	c98ae02668	fix typo in comment (#318 )	2023-03-21 17:49:43 +02:00
Georgi Gerganov	eb34620aec	Add tokenizer test + revert to C++11 (#355 ) * Add test-tokenizer-0 to do a few tokenizations - feel free to expand * Added option to convert-pth-to-ggml.py script to dump just the vocabulary * Added ./models/ggml-vocab.bin containing just LLaMA vocab data (used for tests) * Added utility to load vocabulary file from previous point (temporary implementation) * Avoid using std::string_view and drop back to C++11 (hope I didn't break something) * Rename gpt_vocab -> llama_vocab * All CMake binaries go into ./bin/ now	2023-03-21 17:29:41 +02:00
Qingyou Meng	6b6d5b5024	Fixed tokenizer.model not found error when model dir is symlink (#325 )	2023-03-20 19:33:10 +00:00
Mack Straight	074bea2eb1	sentencepiece bpe compatible tokenizer (#252 ) * potential out of bounds read * fix quantize * style * Update convert-pth-to-ggml.py * mild cleanup * don't need the space-prefixing here rn since main.cpp already does it * new file magic + version header field * readme notice * missing newlines Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>	2023-03-20 03:17:23 -07:00
Georgi Gerganov	c1c7026b47	Fix python stuff (#109 )	2023-03-19 19:33:18 +02:00
qunash	467b149761	Refactoring `convert-pth-to-ggml.py`: more concise and readable (#109 ) * Refactor get_n_parts function to simplify code and improve readability * Use f-strings instead of concatenation * Refactoring: more concise and readable * modularize --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-19 19:17:39 +02:00
Bernat Vadell	2af23d3043	🚀 Dockerize llamacpp (#132 ) * feat: dockerize llamacpp * feat: split build & runtime stages * split dockerfile into main & tools * add quantize into tool docker image * Update .devops/tools.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add docker action pipeline * change CI to publish at github docker registry * fix name runs-on macOS-latest is macos-latest (lowercase) * include docker versioned images * fix github action docker * fix docker.yml * feat: include all-in-one command tool & update readme.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-17 10:47:06 +01:00
Ronsor	956dfda8ad	Use `tokenizer.vocab_size()` instead of hardcoding 32000 in convert-pth-to-ggml.py (#142 ) There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.	2023-03-15 21:37:50 +02:00
Val Kharitonov	2a20f48efa	Fix UTF-8 handling (including colors) (#79 )	2023-03-13 18:24:18 +02:00
Georgi Gerganov	7c9e54e55e	Revert "weights_only" arg - this causing more trouble than help	2023-03-12 20:59:01 +02:00
Oleksandr Nikitin	b9bd1d0141	python/pytorch compat notes (#44 )	2023-03-12 14:16:33 +02:00
deepdiffuser	a93120236f	use weights_only in conversion script (#32 ) this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries	2023-03-12 08:36:35 +02:00
Georgi Gerganov	007a8f6f45	Support all LLaMA models + change Q4_0 quantization storage	2023-03-11 11:28:30 +02:00
Georgi Gerganov	70bc0b8b15	Fix a bug in the rope calculation	2023-03-10 23:46:57 +02:00
Georgi Gerganov	26c0846629	Initial release	2023-03-10 20:56:40 +02:00

18 Commits