llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 06:39:25 +01:00

Author	SHA1	Message	Date
Xuan Son Nguyen	45abe0f74e	server : replace behave with pytest (#10416 ) * server : replace behave with pytest * fix test on windows * misc * add more tests * more tests * styling * log less, fix embd test * added all sequential tests * fix coding style * fix save slot test * add parallel completion test * fix parallel test * remove feature files * update test docs * no cache_prompt for some tests * add test_cache_vs_nocache_prompt	2024-11-26 16:20:18 +01:00
Neo Zhang Jianyu	0bbd2262a3	restore the condistion to build & update pacakge when merge (#10507 ) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-11-26 21:43:47 +08:00
Diego Devesa	7db3846a94	ci : publish the docker images created during scheduled runs (#10515 )	2024-11-26 13:05:20 +01:00
Diego Devesa	c6807b3f28	ci : add ubuntu cuda build, build with one arch on windows (#10456 )	2024-11-26 13:05:07 +01:00
Diego Devesa	50d5cecbda	ci : build docker images only once daily (#10503 )	2024-11-25 22:05:39 +01:00
Johannes Gäßler	1f922254f0	Github: update issue templates [no ci] (#10489 )	2024-11-25 19:18:37 +01:00
Neo Zhang Jianyu	5a8987793f	[SYCL] Fix building Win package for oneAPI 2025.0 update (#10483 ) * fix build package for 2025.0 * debug * debug * fix * rm debug --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-11-25 17:31:10 +08:00
蕭澧邦	6dfcfef078	ci: Update oneAPI runtime dll packaging (#10428 ) This is the minimum runtime dll dependencies for oneAPI 2025.0	2024-11-22 10:44:08 +01:00
Johannes Gäßler	599b3e0cd4	GitHub: ask for more info in issue templates (#10426 ) * GitHub: ask for more info in issues [no ci] * refactor issue templates to be component-specific * more understandable issue description * add dropdown for llama.cpp module	2024-11-22 08:32:40 +01:00
R0CKSTAR	f0204a0ec7	ci: build test musa with cmake (#10298 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-11-15 12:47:25 +01:00
Romain Biessy	5a54af4d4f	sycl: Use syclcompat::dp4a (#10267 ) * sycl: Use syclcompat::dp4a * Using the syclcompat version allow the compiler to optimize the operation with native function * Update news section * Update CI Windows oneAPI version to 2025.0 * Reword doc * Call syclcompat::dp4a inside dpct::dp4a This reverts commit `90cb61d692`.	2024-11-15 11:09:12 +08:00
Diego Devesa	ae8de6d50a	ggml : build backends as libraries (#10256 ) * ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>	2024-11-14 18:04:35 +01:00
Georgi Gerganov	ec450d3bbf	metal : opt-in compile flag for BF16 (#10218 ) * metal : opt-in compile flag for BF16 ggml-ci * ci : use BF16 ggml-ci * swift : switch back to v12 * metal : has_float -> use_float ggml-ci * metal : fix BF16 check in MSL ggml-ci	2024-11-08 21:59:46 +02:00
Eve	3407364776	Q6_K AVX improvements (#10118 ) * q6_k instruction reordering attempt * better subtract method * should be theoretically faster small improvement with shuffle lut, likely because all loads are already done at that stage * optimize bit fiddling * handle -32 offset separately. bsums exists for a reason! * use shift * Update ggml-quants.c * have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86	2024-11-04 23:06:31 +01:00
R0CKSTAR	cf8e0a3bb9	musa: add docker image support (#9685 ) * mtgpu: add docker image support Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * mtgpu: enable docker workflow Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-10-10 20:10:37 +02:00
Xuan Son Nguyen	f3fdcfaa79	ci : fine-grant permission (#9710 )	2024-10-04 11:47:19 +02:00
Diego Devesa	c83ad6d01e	ggml-backend : add device and backend reg interfaces (#9707 ) Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-10-03 01:49:47 +02:00
serhii-nakon	6f1d9d71f4	Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS (#9641 ) * Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS * Set ROCM_DOCKER_ARCH as string due it incorrectly build and cause OOM exit code	2024-09-30 20:57:12 +02:00
compilade	511636df0c	ci : reduce severity of unused Pyright ignore comments (#9697 )	2024-09-30 14:13:16 -04:00
Neo Zhang Jianyu	95bc82fbc0	[SYCL] add missed dll file in package (#9577 ) * update oneapi to 2024.2 * use 2024.1 --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-09-26 17:38:31 +08:00
Xuan Son Nguyen	ea9c32be71	ci : fix docker build number and tag name (#9638 ) * ci : fix docker build number and tag name * fine-grant permissions	2024-09-25 17:26:01 +02:00
Huang Qi	e948a7da7a	CI: Provide prebuilt windows binary for hip (#9467 )	2024-09-21 02:39:41 +02:00
Georgi Gerganov	6262d13e0b	common : reimplement logging (#9418 ) https://github.com/ggerganov/llama.cpp/pull/9418	2024-09-15 20:46:12 +03:00
Mathijs Henquet	78203641fe	server : Add option to return token pieces in /tokenize endpoint (#9108 ) * server : added with_pieces functionality to /tokenize endpoint * server : Add tokenize with pieces tests to server.feature * Handle case if tokenizer splits along utf8 continuation bytes * Add example of token splitting * Remove trailing ws * Fix trailing ws * Maybe fix ci * maybe this fix windows ci? --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-09-12 22:30:11 +02:00
Huang Qi	4dc4f5f14a	ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329 )	2024-09-12 14:28:43 +03:00
Trivikram Kamat	3c26a1644d	ci : bump actions/checkout to v4 (#9377 )	2024-09-12 14:27:45 +03:00
slaren	6c89eb0b47	ci : disable rocm image creation (#9340 )	2024-09-07 10:48:54 +03:00
awatuna	32b2ec88bc	Update build.yml (#9184 ) build rpc-server for windows cuda	2024-09-06 00:34:36 +02:00
slaren	9fe94ccac9	docker : build images only once (#9225 )	2024-08-28 17:28:00 +02:00
Georgi Gerganov	d5492f0525	ci : disable bench workflow (#9010 )	2024-08-15 10:11:11 +03:00
Diogo Teles Sant'Anna	fc4ca27b25	ci : fix github workflow vulnerable to script injection (#9008 ) Signed-off-by: Diogo Teles Sant'Anna <diogoteles@google.com>	2024-08-12 19:28:23 +03:00
Radoslav Gerganov	1f67436c5e	ci : enable RPC in all of the released builds (#9006 ) ref: #8912	2024-08-12 19:17:03 +03:00
Georgi Gerganov	d3ae0ee8d7	py : fix requirements check '==' -> '~=' (#8982 ) * py : fix requirements check '==' -> '~=' * cont : fix the fix * ci : run on all requirements.txt	2024-08-12 11:02:01 +03:00
Johannes Gäßler	6eeaeba126	cmake: use 1 more thread for non-ggml in CI (#8740 )	2024-07-28 22:32:44 +02:00
Johannes Gäßler	69c487f4ed	CUDA: MMQ code deduplication + iquant support (#8495 ) * CUDA: MMQ code deduplication + iquant support * 1 less parallel job for CI build	2024-07-20 22:25:26 +02:00
bandoti	17eb6aa8a9	vulkan : cmake integration (#8119 ) * Add Vulkan to CMake pkg * Add Sycl to CMake pkg * Add OpenMP to CMake pkg * Split generated shader file into separate translation unit * Add CMake target for Vulkan shaders * Update README.md * Add make target for Vulkan shaders * Use pkg-config to locate vulkan library * Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow * Clean up tabs * Move sudo to apt-key invocation * Forward GGML_EXTRA_LIBS to CMake config pkg * Update vulkan obj file paths * Add shaderc to nix pkg * Add python3 to Vulkan nix build * Link against ggml in cmake pkg * Remove Python dependency from Vulkan build * code review changes * Remove trailing newline * Add cflags from pkg-config to fix w64devkit build * Update README.md * Remove trailing whitespace * Update README.md * Remove trailing whitespace * Fix doc heading * Make glslc required Vulkan component * remove clblast from nix pkg	2024-07-13 18:12:39 +02:00
Alberto Cabrera Pérez	a130eccef4	labeler : updated sycl to match docs and code refactor (#8373 )	2024-07-08 22:35:17 +02:00
compilade	3fd62a6b1c	py : type-check all Python scripts with Pyright (#8341 ) * py : type-check all Python scripts with Pyright * server-tests : use trailing slash in openai base_url * server-tests : add more type annotations * server-tests : strip "chat" from base_url in oai_chat_completions * server-tests : model metadata is a dict * ci : disable pip cache in type-check workflow The cache is not shared between branches, and it's 250MB in size, so it would become quite a big part of the 10GB cache limit of the repo. * py : fix new type errors from master branch * tests : fix test-tokenizer-random.py Apparently, gcc applies optimisations even when pre-processing, which confuses pycparser. * ci : only show warnings and errors in python type-check The "information" level otherwise has entries from 'examples/pydantic_models_to_grammar.py', which could be confusing for someone trying to figure out what failed, considering that these messages can safely be ignored even though they look like errors.	2024-07-07 15:04:39 -04:00
Clint Herron	07a3fc0608	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258 )	2024-07-02 12:18:10 -04:00
Olivier Chafik	8748d8ac6f	json: attempt to skip slow tests when running under emulator (#8189 )	2024-06-28 18:02:05 +01:00
loonerin	558f44bf83	CI: fix release build (Ubuntu+Mac) (#8170 ) * CI: fix release build (Ubuntu) PR #8006 changes defaults to build shared libs. However, CI for releases expects static builds. * CI: fix release build (Mac) --------- Co-authored-by: loonerin <loonerin@users.noreply.github.com>	2024-06-27 21:01:23 +02:00
slaren	ae5d0f4b89	ci : publish new docker images only when the files change (#8142 )	2024-06-26 21:59:28 +02:00
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00
slaren	dd047b476c	disable docker CI on pull requests (#8110 )	2024-06-25 19:20:06 +02:00
slaren	8cb508d0d5	disable publishing the full-rocm docker image (#8083 )	2024-06-24 08:36:11 +03:00
slaren	b6b9a8e606	fix CI failures (#8066 ) * test-backend-ops : increase cpy max nmse * server ci : disable thread sanitizer	2024-06-23 13:14:45 +02:00
slaren	9c77ec1d74	ggml : synchronize threads using barriers (#7993 )	2024-06-19 15:04:15 +02:00
Georgi Gerganov	a04a953cab	codecov : remove (#8004 )	2024-06-19 13:04:36 +03:00
Georgi Gerganov	c8a82194a8	github : update pr template	2024-06-16 10:46:51 +03:00
olexiyb	f8ec8877b7	ci : fix macos x86 build (#7940 ) In order to use old `macos-latest` we should use `macos-12` Potentially will fix: https://github.com/ggerganov/llama.cpp/issues/6975	2024-06-14 20:28:34 +03:00

1 2 3 4 5

233 Commits