llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-12 13:27:21 +01:00

Author	SHA1	Message	Date
Olivier Chafik	051633ed2d	update dockerfile refs	2024-06-10 16:05:11 +01:00
Olivier Chafik	1cc651446d	rename(make): llama-baby-llama	2024-06-10 16:03:18 +01:00
Olivier Chafik	0fcf2c328e	rename dockerfile w/ llama-cli	2024-06-10 15:44:49 +01:00
Olivier Chafik	0bb2a3f233	fix some missing -cli suffixes	2024-06-10 15:42:20 +01:00
Olivier Chafik	daeaeb1222	Merge remote-tracking branch 'origin/master' into bins	2024-06-10 15:38:41 +01:00
Olivier Chafik	5265c15d4c	rename llama\|main -> llama-cli; consistent RPM bin prefixes	2024-06-10 15:34:14 +01:00
slaren	fd5ea0f897	ci : try win-2019 on server windows test (#7854 )	2024-06-10 15:18:41 +03:00
Georgi Gerganov	c28a83902c	examples : remove --instruct remnants (#7846 )	2024-06-10 15:00:15 +03:00
Georgi Gerganov	d9da0e4986	server : improve "prompt" handling (#7847 )	2024-06-10 14:59:55 +03:00
Johannes Gäßler	1f0dabda8d	CUDA: use tensor cores for MMQ (#7676 ) * CUDA: int8 tensor cores for MMQ (legacy quants) * fix out-of-bounds writes * __builtin_assume -> GGML_CUDA_ASSUME * fix writeback returning too early	2024-06-10 11:45:13 +02:00
Ben Ashbaugh	af4ae502dd	use the correct SYCL context for host USM allocations (#7777 ) Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>	2024-06-10 10:21:31 +01:00
Georgi Gerganov	10ceba354a	flake.lock: Update (#7838 ) Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29) → 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-06-09 16:04:50 -07:00
Georgi Gerganov	e95beeb1fc	imatrix : handle partial entries (#7833 )	2024-06-09 20:19:35 +03:00
Nicolás Pérez	57bf62ce7c	docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700 ) This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions. Co-authored-by: Brian <mofosyne@gmail.com> Co-authored-by: compilade <git@compilade.net>	2024-06-10 01:24:29 +10:00
mgroeber9110	3e2ee44315	server: do not remove whitespace at the start of a completion chunk (#7830 )	2024-06-09 20:50:35 +10:00
Johannes Gäßler	42b53d192f	CUDA: revise q8_1 data layout for mul_mat_q (#7824 )	2024-06-09 09:42:25 +02:00
sasha0552	2decf57bc6	convert-hf : set the model name based on cli arg, if present (#7693 ) `--model-name` argument was added a while ago but did not do anything. This commit fixes this issue and enables this feature.	2024-06-09 16:39:25 +10:00
compilade	5795b94182	convert-hf : match model part name prefix and suffix (#7687 ) In #7075, to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names. But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present. This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some persistent problem, but shall do in the meantime.	2024-06-09 12:47:25 +10:00
compilade	ed9f252118	gguf-py : decouple adding metadata from writing in GGUFWriter (#7827 ) Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value. In addition use_temp_file is now opt-in instead of opt-out defaulting to False. Also GGUFWriter now does not require output file name until when actually writing to it. And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata	2024-06-09 12:34:29 +10:00
slaren	fe1e3917cf	Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682 )" (#7808 ) This reverts commit 9422c5e34bbd302493b77a8f6d546154a1f4fe82.	2024-06-09 01:43:39 +02:00
Olivier Chafik	d4d915d351	url: save -mu downloads to new cache location (#7826 ) * url: save -mu download to new cache location * url: fs_get_cache_file_path util * url: tweak sig of fs_get_cache_file	2024-06-08 21:21:08 +02:00
Olivier Chafik	347f30803f	rename Dockerfiles	2024-06-08 15:10:32 +01:00
Olivier Chafik	78eae7f3ba	gitignore /llama-*	2024-06-08 14:29:35 +01:00
Olivier Chafik	efaa441233	fix llama-lookup-* Makefile rules	2024-06-08 14:26:11 +01:00
Olivier Chafik	b0eb3b88e9	rm bin files	2024-06-08 14:16:32 +01:00
Olivier Chafik	eef922e02e	sort cmake example subdirs	2024-06-08 14:09:28 +01:00
Olivier Chafik	b648243496	add/fix gbnf-validator subfolder to cmake	2024-06-08 14:07:56 +01:00
Olivier Chafik	81222f02db	prefix more cmake targets w/ llama-	2024-06-08 14:05:34 +01:00
Olivier Chafik	10650b692d	rename {main->llama}-cmake-pkg binary	2024-06-08 13:57:06 +01:00
Olivier Chafik	78bca8cb07	fix main refs	2024-06-08 13:52:03 +01:00
Olivier Chafik	ab5efbb3b6	Prefix all example bins w/ llama-	2024-06-08 13:42:01 +01:00
Olivier Chafik	23d0df5bd5	main: target name -> llama-cli	2024-06-08 12:50:35 +01:00
Olivier Chafik	fe93cc96cc	Merge remote-tracking branch 'origin/master' into bins	2024-06-08 12:04:52 +01:00
sasha0552	7a16ce7db2	server : smart slot selection using Longest Common Prefix (#7728 ) * server : Smart selection of available slot using Longest Common Substring * add usage * remove trailing whitespaces * Use Longest Common Prefix (LCP) instead of LCS * Rename argument	2024-06-08 10:50:31 +03:00
slaren	da799b4189	vulkan : reuse parent extra for views (#7806 ) * vulkan : reuse parent extra for views * Fix validation error when multiple compute contexts are used in a graph --------- Co-authored-by: 0cc4m <picard12@live.de>	2024-06-07 19:47:49 +02:00
Christian Zhou-Zheng	c00fad71e5	gguf-split : change binary multi-byte units to decimal (#7803 )	2024-06-07 15:56:01 +03:00
intelmatt	27615f5ab2	cmake : fix BUILD_SHARED_LIBS=ON build (#7784 ) common depends on pthreads in Linux	2024-06-07 15:15:07 +03:00
Olivier Chafik	0dba58269f	Update server-llm.sh	2024-06-07 11:52:40 +01:00
Johannes Gäßler	7027b27d76	server: update cache_prompt documentation [no ci] (#7745 )	2024-06-07 11:15:49 +02:00
ochafik	af8f0169da	Update .gitignore	2024-06-07 10:14:03 +01:00
ochafik	7fbe6006c9	update straggling refs	2024-06-07 09:42:21 +01:00
ochafik	99df4cc091	rm accidentally checked in bins	2024-06-07 09:40:09 +01:00
woodx	a5cabd7649	server : do not get prompt in infill mode (#7286 ) * avoid to get prompt in infill mode and embedding mode * remove embedding mode * refactor format --------- Co-authored-by: wudexiang <wudexiang@bytedance.com>	2024-06-07 10:09:45 +03:00
pengxin99	d5c938cd77	[SYCL] fix softmax r2r result wrong issue (#7811 )	2024-06-07 14:28:26 +08:00
slaren	c9ee7118d5	check for nans in imatrix and quantize (#7807 ) * imatrix : detect nan/inf values * quantize : check imatrix for nan/inf values	2024-06-07 09:01:29 +03:00
ochafik	fbd83131f5	Merge remote-tracking branch 'origin/master' into bins	2024-06-07 00:51:31 +01:00
ochafik	a0a7f2b031	Update build.yml	2024-06-07 00:38:05 +01:00
ochafik	8695baebc0	update more names	2024-06-07 00:21:01 +01:00
Georgi Gerganov	ee459f40f6	server : fix --threads-http arg (#7801 )	2024-06-06 19:19:59 +03:00
Olivier Chafik	9a03341094	main/server: fix targets	2024-06-06 15:53:25 +01:00

1 2 3 4 5 ...

3154 Commits