llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-29 07:34:18 +01:00

Author	SHA1	Message	Date
Georgi Gerganov	c5650ed470	server : avoid context swaps by shifting the KV cache	2023-09-28 19:03:36 +03:00
Georgi Gerganov	ce2d995af2	server : clear the KV cache beyond n_past before llama_decode	2023-09-28 18:12:39 +03:00
Georgi Gerganov	2b8830af71	examples : do not eval prompt 2 times (close #3348 )	2023-09-28 17:48:46 +03:00
Georgi Gerganov	a207561503	examples : add example for batched decoding	2023-09-28 17:32:04 +03:00
Georgi Gerganov	d008733e6b	examples : utilize new llama_get_logits_ith()	2023-09-28 16:05:37 +03:00
Georgi Gerganov	4c72ab13b2	metal : use mm kernels for batch size > 2	2023-09-28 16:02:20 +03:00
Georgi Gerganov	e9463792d3	llama : simplify returns if/else branches	2023-09-28 16:01:49 +03:00
Georgi Gerganov	4ad0676927	parallel : fix crash when `-n -1`	2023-09-28 15:48:38 +03:00
Georgi Gerganov	25856900db	Merge branch 'master' into custom-attention-mask	2023-09-28 15:19:57 +03:00
Pierre Alexandre SCHEMBRI	4aea3b846e	readme : add Mistral AI release 0.1 (#3362 )	2023-09-28 15:13:37 +03:00
slaren	da0400344b	ggml-cuda : perform cublas fp16 matrix multiplication as fp16 (#3370 ) * ggml-cuda : perform cublas fp16 matrix multiplication as fp16 * try to fix rocm build * restrict fp16 mat mul to volta and up	2023-09-28 13:08:28 +03:00
Zhang Peiyuan	e519621010	convert : remove bug in convert.py permute function (#3364 )	2023-09-27 20:45:20 +02:00
Richard Roberson	ac43576124	make-ggml.py : compatibility with more models and GGUF (#3290 ) * Resync my fork with new llama.cpp commits * examples : rename to use dash instead of underscore * New model conversions --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-27 19:25:12 +03:00
Cebtenzzre	20c7e1e804	gguf : fix a few general keys (#3341 )	2023-09-27 12:18:07 -04:00
Rickard Hallerbäck	dc6897404e	metal : reusing llama.cpp logging (#3152 ) * metal : reusing llama.cpp logging * cmake : build fix * metal : logging callback * metal : logging va_args memory fix * metal : minor cleanup * metal : setting function like logging macro to capital letters * llama.cpp : trailing whitespace fix * ggml : log level enum used by llama * Makefile : cleanup ggml-metal recipe * ggml : ggml_log_callback typedef * ggml : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-27 18:48:33 +03:00
Jag Chadha	527e57cfd8	build : add ACCELERATE_NEW_LAPACK to fix warning on macOS Sonoma (#3342 )	2023-09-27 18:34:32 +03:00
BarfingLemurs	ffe88a36a9	readme : add some recent perplexity and bpw measurements to READMES, link for k-quants (#3340 ) * Update README.md * Update README.md * Update README.md with k-quants bpw measurements	2023-09-27 18:30:36 +03:00
Georgi Gerganov	c1596f633f	llama : fix kv cache heuristic when context is less than 32	2023-09-27 18:12:43 +03:00
DAN™	99115f3fa6	cmake : fix build-info.h on MSVC (#3309 )	2023-09-25 18:45:33 -04:00
2f38b454	1726f9626f	docs: Fix typo CLBlast_DIR var. (#3330 )	2023-09-25 20:24:52 +02:00
Erik Scholz	a98b1633d5	nix : add cuda, use a symlinked toolkit for cmake (#3202 )	2023-09-25 13:48:30 +02:00
slaren	c091cdfb24	llama-bench : add README (#3317 ) * llama-bench : add README * minor edit	2023-09-23 21:48:24 +02:00
Cebtenzzre	51a7cf5c6e	examples : fix RoPE defaults to match PR #3240 (#3315 )	2023-09-23 12:28:50 +03:00
Kevin Ji	bedb92b603	scripts : use `/usr/bin/env` in shebang (#3313 )	2023-09-22 23:52:23 -04:00
Lee Drake	bc9d3e3971	Update README.md (#3289 ) * Update README.md * Update README.md Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2023-09-21 21:00:24 +02:00
shibe2	36b904e200	ggml-opencl.cpp: Make private functions static (#3300 )	2023-09-21 14:10:26 -04:00
Georgi Gerganov	8845160058	simple : add README.md	2023-09-21 20:10:14 +02:00
Georgi Gerganov	5a3369d8e8	llama : llama.h formatting + comments	2023-09-21 19:51:32 +02:00
Edward Taylor	324f3403d5	zig : fix for updated c lib (#3259 )	2023-09-21 12:08:20 +03:00
yuiseki	f56c418ab0	embedding : update README.md (#3224 )	2023-09-21 11:57:40 +03:00
Johannes Gäßler	8185710a80	CUDA: use only 1 thread if fully offloaded (#2915 )	2023-09-21 11:43:53 +03:00
Georgi Gerganov	7eb41179ed	readme : update hot topics	2023-09-20 20:48:22 +03:00
Georgi Gerganov	b2debf65f2	parallel : add disabled experimental batch chunking in powers of two	2023-09-20 20:14:05 +03:00
Cebtenzzre	a5661d7e71	llama : allow gguf RoPE keys to be overridden with defaults (#3240 )	2023-09-20 12:12:47 -04:00
Georgi Gerganov	ded9b43cad	parallel : fix cases where the input prompts can overflow the batch	2023-09-20 19:09:25 +03:00
Cebtenzzre	65c2c1c5ab	benchmark-matmult : do not use integer abs() on a float (#3277 )	2023-09-20 12:06:08 -04:00
Georgi Gerganov	ee1d670cc6	parallel : fix bug (extra BOS) + smaller token_prev array	2023-09-20 17:32:21 +03:00
kang	80834daecf	flake : Restore default package's buildInputs (#3262 )	2023-09-20 15:48:22 +02:00
slaren	1be2b8c19b	ggml : revert change to ggml_cpy, add ggml_cont_Nd instead (#3275 ) ggml-ci	2023-09-20 16:12:51 +03:00
Alon	a40f2b656f	CI: FreeBSD fix (#3258 ) * - freebsd ci: use qemu	2023-09-20 14:06:36 +02:00
Georgi Gerganov	2f3a46fccf	train : make KQ_pos memory buffer permanent via dummy scale op	2023-09-20 14:14:50 +03:00
Georgi Gerganov	54206962c7	llama : disable MPI for now ggml-ci	2023-09-20 14:07:29 +03:00
slaren	e04dc51988	ggml-cuda : add rope f16, restore performance with parallel decoding (#3272 ) * ggml-cuda : add rope f16, restore performance * offload KQ_mask with all models * fix rope shift --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-20 14:00:28 +03:00
Georgi Gerganov	db0fc2da06	simple : improve comments + free batch	2023-09-20 13:54:20 +03:00
Georgi Gerganov	b377bf2266	simple : add parallel decoding support	2023-09-20 13:06:34 +03:00
Georgi Gerganov	addae65fd4	llama : improve llama_batch API + simplify parallel example	2023-09-20 11:03:18 +03:00
Georgi Gerganov	d119c04c15	examples : fix benchmark-matmult (#1554 ) The precision for Q4_0 has degraded since #1508	2023-09-20 10:02:39 +03:00
Georgi Gerganov	a1327c71c6	parallel : rename hot-plug to continuous-batching	2023-09-20 09:24:41 +03:00
Georgi Gerganov	e1067efbfa	llama : fix n_kv to never become 0	2023-09-20 09:17:05 +03:00
Georgi Gerganov	7b7472ee26	parallel : minor	2023-09-20 00:35:10 +03:00

1 2 3 4 5 ...

1338 Commits