llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-30 22:03:03 +01:00

Author	SHA1	Message	Date
Georgi Gerganov	537b09e70f	TMP : push artifacts	2025-01-24 16:02:25 +02:00
Georgi Gerganov	8b2ed1e432	ci : remove obsolete MacOS build	2025-01-24 16:02:14 +02:00
Georgi Gerganov	f9f65f0162	ci : try to fix macos build rpaths	2025-01-24 16:01:32 +02:00
Georgi Gerganov	56e26a7f30	ci : change ubuntu build from latest to 20.04	2025-01-24 15:59:09 +02:00
Georgi Gerganov	194358e3b7	ci : restore the original HIP commands	2025-01-24 15:41:52 +02:00
Georgi Gerganov	50455ded31	ci : fix HIP cmake compiler options to be on first line	2025-01-24 15:23:44 +02:00
Georgi Gerganov	564353c9a3	Revert "TMP : push artifacts" This reverts commit `4decf2c4df`.	2025-01-24 15:22:36 +02:00
Georgi Gerganov	4decf2c4df	TMP : push artifacts	2025-01-24 14:54:24 +02:00
Georgi Gerganov	3a35bfe1f7	cmake : put libs in /bin	2025-01-24 14:42:46 +02:00
Georgi Gerganov	ff4cb6ef4c	release : pack /lib and /include in the packages	2025-01-24 13:28:37 +02:00
Eric Curtin	a1649cc13f	Adding linenoise.cpp to llama-run (#11252 ) This is a fork of linenoise that is C++17 compatible. I intend on adding it to llama-run so we can do things like traverse prompt history via the up and down arrows: https://github.com/ericcurtin/linenoise.cpp Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-18 14:42:31 +00:00
Georgi Gerganov	4dd34ff831	cmake : add sanitizer flags for llama.cpp (#11279 ) * cmake : add sanitizer flags for llama.cpp ggml-ci * tests : fix compile warnings ggml-ci * cmake : move sanitizer flags to llama_add_compile_flags ggml-ci * cmake : move llama.cpp compile flags to top level lists ggml-ci * cmake : apply only sanitizer flags at top level ggml-ci * tests : fix gguf context use in same_tensor_data * gguf-test: tensor data comparison * dummy : trigger ggml-ci * unicode : silence gcc warnings ggml-ci * ci : use sanitizer builds only in Debug mode ggml-ci * cmake : add status messages [no ci] --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-01-18 16:18:15 +02:00
Xuan Son Nguyen	f7cd13301c	ci : use actions from ggml-org (#11140 )	2025-01-08 16:09:20 +01:00
Georgi Gerganov	0d52a69e4b	ci : fix cmake option (#11125 )	2025-01-08 11:29:34 +02:00
Georgi Gerganov	f66f582927	llama : refactor `src/llama.cpp` (#10902 ) * llama : scatter llama.cpp into multiple modules (wip) * llama : control-vector -> adapter * llama : arch * llama : mmap ggml-ci * ci : remove BUILD_SHARED_LIBS=OFF ggml-ci * llama : arch (cont) ggml-ci * llama : chat ggml-ci * llama : model ggml-ci * llama : hparams ggml-ci * llama : adapter ggml-ci * examples : fix ggml-ci * rebase ggml-ci * minor * llama : kv cache ggml-ci * llama : impl ggml-ci * llama : batch ggml-ci * cont ggml-ci * llama : context ggml-ci * minor * llama : context (cont) ggml-ci * llama : model loader ggml-ci * common : update lora ggml-ci * llama : quant ggml-ci * llama : quant (cont) ggml-ci * minor [no ci]	2025-01-03 10:18:53 +02:00
Eve	7b1ec53f56	vulkan: bugfixes for small subgroup size systems + llvmpipe test (#10809 ) * ensure mul mat shaders work on systems with subgroup size less than 32 more fixes add test * only s_warptile_mmq needs to be run with 32 threads or more	2024-12-17 06:52:55 +01:00
lhez	a76c56fa1a	Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (#10693 ) * [cl][adreno] Add Adreno GPU support Add new OpenCL backend to support Adreno GPUs --------- Co-authored-by: Skyler Szot <quic_sszot@quicinc.com> Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com> Co-authored-by: Alexander Angus <quic_aangus@quicinc.com> Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com> Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com> * [cl][ci] Add workflow for CL * [cl][adreno] Fix memory leak for non SMALL_ALLOC path * opencl: integrate backend dyn.load interface and fix compiler and format warnings * opencl: remove small-alloc support and fix build errors for non-opencl platforms * opencl: fixed merge conflict (MUSA added twice in cmake) * opencl-ci: use RUNNER_TEMP instead of github.workspace * opencl: fix embed tool invocation with python3 * opencl: CI workflow fixes * opencl: Clean up small-alloc in CMake files * opencl: cleanup ggml-opencl2 header file * opencl: use ulong for offsets and strides in ADD kernel * opencl: use cl_ulong for all offsets * opencl: use cl_ulong for sizes and strides * opencl: use `GGML_LOG_xxx` instead of `fprintf(stderr, ...)` * opencl: rename backend `opencl2` -> `opencl` * opencl: rename kernel files `ggml-opencl2` -> `ggml-opencl` * opencl: make OpenCL required, remove redundant lib and inc directories * `ggml-base`, `..` and `.` are added by `ggml_add_backend_library` * opencl: rename backend - funcs, structs, etc `opencl2` -> `opencl` * opencl: remove copyright marker since main license already covers * opencl: replace some more OPENCL2 leftovers * opencl: remove limits on `tensor_extra` * opencl: use pools for `tensor_extra` * opencl: fix compiler warnings with GCC and Clang Still getting the warning about clCreateCmdQueue being obsolete. Will fix that separately. * opencl: fail gracefully if opencl devices are not available Also for unsupported GPUs. * opencl: fix MSVC builds (string length error) * opencl: check for various requirements, allow deprecated API * opencl: update log message for unsupported GPUs --------- Co-authored-by: Skyler Szot <quic_sszot@quicinc.com> Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com> Co-authored-by: Alexander Angus <quic_aangus@quicinc.com> Co-authored-by: Hongqiang Wang <quic_wangh@quicinc.com> Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>	2024-12-13 12:23:52 -08:00
Diego Devesa	43ed389a3f	llama : use cmake for swift build (#10525 ) * llama : use cmake for swift build * swift : <> -> "" * ci : remove make * ci : disable ios build * Revert "swift : <> -> """ This reverts commit `d39ffd9556`. * ci : try fix ios build * ci : cont * ci : cont --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-08 13:14:54 +02:00
Georgi Gerganov	8648c52101	make : deprecate (#10514 ) * make : deprecate ggml-ci * ci : disable Makefile builds ggml-ci * docs : remove make references [no ci] * ci : disable swift build ggml-ci * docs : remove obsolete make references, scripts, examples ggml-ci * basic fix for compare-commits.sh * update build.md * more build.md updates * more build.md updates * more build.md updates * Update Makefile Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-12-02 21:22:53 +02:00
Diego Devesa	7cc2d2c889	ggml : move AMX to the CPU backend (#10570 ) * ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-29 21:54:58 +01:00
Diego Devesa	e90688edd0	ci : fix tag name in cuda and hip releases (#10566 )	2024-11-28 15:58:54 +01:00
Diego Devesa	46c69e0e75	ci : faster CUDA toolkit installation method and use ccache (#10537 ) * ci : faster CUDA toolkit installation method and use ccache * remove fetch-depth * only pack CUDA runtime on master	2024-11-27 11:03:25 +01:00
Diego Devesa	c9b00a70b0	ci : fix cuda releases (#10532 )	2024-11-26 22:12:10 +01:00
Neo Zhang Jianyu	0bbd2262a3	restore the condistion to build & update pacakge when merge (#10507 ) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-11-26 21:43:47 +08:00
Diego Devesa	c6807b3f28	ci : add ubuntu cuda build, build with one arch on windows (#10456 )	2024-11-26 13:05:07 +01:00
Neo Zhang Jianyu	5a8987793f	[SYCL] Fix building Win package for oneAPI 2025.0 update (#10483 ) * fix build package for 2025.0 * debug * debug * fix * rm debug --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-11-25 17:31:10 +08:00
蕭澧邦	6dfcfef078	ci: Update oneAPI runtime dll packaging (#10428 ) This is the minimum runtime dll dependencies for oneAPI 2025.0	2024-11-22 10:44:08 +01:00
R0CKSTAR	f0204a0ec7	ci: build test musa with cmake (#10298 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-11-15 12:47:25 +01:00
Romain Biessy	5a54af4d4f	sycl: Use syclcompat::dp4a (#10267 ) * sycl: Use syclcompat::dp4a * Using the syclcompat version allow the compiler to optimize the operation with native function * Update news section * Update CI Windows oneAPI version to 2025.0 * Reword doc * Call syclcompat::dp4a inside dpct::dp4a This reverts commit `90cb61d692`.	2024-11-15 11:09:12 +08:00
Diego Devesa	ae8de6d50a	ggml : build backends as libraries (#10256 ) * ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>	2024-11-14 18:04:35 +01:00
Georgi Gerganov	ec450d3bbf	metal : opt-in compile flag for BF16 (#10218 ) * metal : opt-in compile flag for BF16 ggml-ci * ci : use BF16 ggml-ci * swift : switch back to v12 * metal : has_float -> use_float ggml-ci * metal : fix BF16 check in MSL ggml-ci	2024-11-08 21:59:46 +02:00
Eve	3407364776	Q6_K AVX improvements (#10118 ) * q6_k instruction reordering attempt * better subtract method * should be theoretically faster small improvement with shuffle lut, likely because all loads are already done at that stage * optimize bit fiddling * handle -32 offset separately. bsums exists for a reason! * use shift * Update ggml-quants.c * have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86	2024-11-04 23:06:31 +01:00
Xuan Son Nguyen	f3fdcfaa79	ci : fine-grant permission (#9710 )	2024-10-04 11:47:19 +02:00
serhii-nakon	6f1d9d71f4	Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS (#9641 ) * Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS * Set ROCM_DOCKER_ARCH as string due it incorrectly build and cause OOM exit code	2024-09-30 20:57:12 +02:00
Neo Zhang Jianyu	95bc82fbc0	[SYCL] add missed dll file in package (#9577 ) * update oneapi to 2024.2 * use 2024.1 --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-09-26 17:38:31 +08:00
Huang Qi	e948a7da7a	CI: Provide prebuilt windows binary for hip (#9467 )	2024-09-21 02:39:41 +02:00
Georgi Gerganov	6262d13e0b	common : reimplement logging (#9418 ) https://github.com/ggerganov/llama.cpp/pull/9418	2024-09-15 20:46:12 +03:00
Huang Qi	4dc4f5f14a	ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329 )	2024-09-12 14:28:43 +03:00
Trivikram Kamat	3c26a1644d	ci : bump actions/checkout to v4 (#9377 )	2024-09-12 14:27:45 +03:00
awatuna	32b2ec88bc	Update build.yml (#9184 ) build rpc-server for windows cuda	2024-09-06 00:34:36 +02:00
Radoslav Gerganov	1f67436c5e	ci : enable RPC in all of the released builds (#9006 ) ref: #8912	2024-08-12 19:17:03 +03:00
Johannes Gäßler	6eeaeba126	cmake: use 1 more thread for non-ggml in CI (#8740 )	2024-07-28 22:32:44 +02:00
Johannes Gäßler	69c487f4ed	CUDA: MMQ code deduplication + iquant support (#8495 ) * CUDA: MMQ code deduplication + iquant support * 1 less parallel job for CI build	2024-07-20 22:25:26 +02:00
bandoti	17eb6aa8a9	vulkan : cmake integration (#8119 ) * Add Vulkan to CMake pkg * Add Sycl to CMake pkg * Add OpenMP to CMake pkg * Split generated shader file into separate translation unit * Add CMake target for Vulkan shaders * Update README.md * Add make target for Vulkan shaders * Use pkg-config to locate vulkan library * Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow * Clean up tabs * Move sudo to apt-key invocation * Forward GGML_EXTRA_LIBS to CMake config pkg * Update vulkan obj file paths * Add shaderc to nix pkg * Add python3 to Vulkan nix build * Link against ggml in cmake pkg * Remove Python dependency from Vulkan build * code review changes * Remove trailing newline * Add cflags from pkg-config to fix w64devkit build * Update README.md * Remove trailing whitespace * Update README.md * Remove trailing whitespace * Fix doc heading * Make glslc required Vulkan component * remove clblast from nix pkg	2024-07-13 18:12:39 +02:00
Olivier Chafik	8748d8ac6f	json: attempt to skip slow tests when running under emulator (#8189 )	2024-06-28 18:02:05 +01:00
loonerin	558f44bf83	CI: fix release build (Ubuntu+Mac) (#8170 ) * CI: fix release build (Ubuntu) PR #8006 changes defaults to build shared libs. However, CI for releases expects static builds. * CI: fix release build (Mac) --------- Co-authored-by: loonerin <loonerin@users.noreply.github.com>	2024-06-27 21:01:23 +02:00
slaren	ae5d0f4b89	ci : publish new docker images only when the files change (#8142 )	2024-06-26 21:59:28 +02:00
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00
olexiyb	f8ec8877b7	ci : fix macos x86 build (#7940 ) In order to use old `macos-latest` we should use `macos-12` Potentially will fix: https://github.com/ggerganov/llama.cpp/issues/6975	2024-06-14 20:28:34 +03:00
Olivier Chafik	1c641e6aac	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 ) * `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew * server: update refs -> llama-server gitignore llama-server * server: simplify nix package * main: update refs -> llama fix examples/main ref * main/server: fix targets * update more names * Update build.yml * rm accidentally checked in bins * update straggling refs * Update .gitignore * Update server-llm.sh * main: target name -> llama-cli * Prefix all example bins w/ llama- * fix main refs * rename {main->llama}-cmake-pkg binary * prefix more cmake targets w/ llama- * add/fix gbnf-validator subfolder to cmake * sort cmake example subdirs * rm bin files * fix llama-lookup-* Makefile rules * gitignore /llama-* * rename Dockerfiles * rename llama\|main -> llama-cli; consistent RPM bin prefixes * fix some missing -cli suffixes * rename dockerfile w/ llama-cli * rename(make): llama-baby-llama * update dockerfile refs * more llama-cli(.exe) * fix test-eval-callback * rename: llama-cli-cmake-pkg(.exe) * address gbnf-validator unused fread warning (switched to C++ / ifstream) * add two missing llama- prefixes * Updating docs for eval-callback binary to use new `llama-` prefix. * Updating a few lingering doc references for rename of main to llama-cli * Updating `run-with-preset.py` to use new binary names. Updating docs around `perplexity` binary rename. * Updating documentation references for lookup-merge and export-lora * Updating two small `main` references missed earlier in the finetune docs. * Update apps.nix * update grammar/README.md w/ new llama-* names * update llama-rpc-server bin name + doc * Revert "update llama-rpc-server bin name + doc" This reverts commit `e474ef1df4`. * add hot topic notice to README.md * Update README.md * Update README.md * rename gguf-split & quantize bins refs in **/tests.sh --------- Co-authored-by: HanClinto <hanclinto@gmail.com>	2024-06-13 00:41:52 +01:00

1 2 3

148 Commits