llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-02-04 23:52:32 +01:00

Author	SHA1	Message	Date
Srihari-mcw	1e7b9299c6	ggml : AVX512 gemm for Q4_0_8_8 (#9532 ) * AVX512 version of ggml_gemm_q4_0_8x8_q8_0 * Remove zero vector parameter passing * Rename functions and rearrange order of macros * Edit commments * style : minor adjustments * Update x to start from 0 --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-09-23 17:06:38 +03:00
Georgi Gerganov	bf9c1013ac	metal : use F32 prec for K*Q in vec FA (#9595 ) ggml-ci	2024-09-23 11:27:47 +03:00
Akarshan Biswas	e62e9789cd	Revert "[SYCL] fallback mmvq (#9088 )" (#9579 ) This reverts commit `50addec9a5`.	2024-09-23 11:28:06 +08:00
R0CKSTAR	c35e586ea5	musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (#9526 ) * mtgpu: add mp_21 support Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * mtgpu: disable flash attention on qy1 (MTT S80); disable q3_k and mul_mat_batched_cublas Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * mtgpu: enable unified memory Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * mtgpu: map cublasOperation_t to mublasOperation_t (sync code to latest) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-09-22 16:55:49 +02:00
Molly Sophia	912c331d3d	Fix merge error in #9454 (#9589 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-09-22 15:26:50 +02:00
Johannes Gäßler	a5b57b08ce	CUDA: enable Gemma FA for HIP/Pascal (#9581 )	2024-09-22 09:34:52 +02:00
Molly Sophia	2a63caaa69	RWKV v6: RWKV_WKV op CUDA implementation (#9454 ) * ggml: CUDA unary op EXP Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * ggml: rwkv_wkv op CUDA impl Signed-off-by: Molly Sophia <mollysophia379@gmail.com> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-09-22 04:29:12 +02:00
slaren	d09770cae7	ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG (#9573 )	2024-09-21 14:24:23 +02:00
agray3	41f477879f	Update CUDA graph on scale change plus clear nodes/params (#9550 ) * Avoid using saved CUDA graph if scale changes and reset nodes/params on update Fixes https://github.com/ggerganov/llama.cpp/issues/9451 * clear before resize	2024-09-21 02:41:07 +02:00
Georgi Gerganov	d13edb17ed	ggml : fix builds (#0 ) ggml-ci	2024-09-20 21:15:05 +03:00
Georgi Gerganov	27609c49b9	ggml : fix trailing whitespace (#0 ) ggml-ci	2024-09-20 21:15:05 +03:00
Johannes Gäßler	424c5d00a9	ggml/examples: add backend support for numerical optimization (ggml/949) * CUDA eval works * stochastic gradient descent op * Adam except decay * CUDA CROSS_ENTROPY_LOSS_BACK * CUDA mnist-fc training works * backend CLI arg * refactor gguf load * remove sched from opt_step_adam * implement l1 regularization (weight decay) * extra call to add optimizer * initialize gradients with ggml_graph_reset * gradient accumulation * increment iter per eval instead of epoch * adjust backend interfaces * fix ggml_graph_reset without backend * fix ggml graph export/import * fixup * rename * revert ggml_opt changes * more general CUDA repeat_back * update documentation, fix CNN * validation split * add clarifying comment * optimize PyTorch training * adjust buffer size, thread count * fix 0.0f validation split * Update examples/mnist/mnist-common.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix gradient accumulation * tensor flag for accumulators -> tensor hash set * Update include/ggml.h Co-authored-by: slaren <slarengh@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: slaren <slarengh@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: slaren <slarengh@gmail.com> * fix test prints * Update src/ggml-backend.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * better CUDA support for noncontiguous out_prod * add comment --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>	2024-09-20 21:15:05 +03:00
Georgi Gerganov	a6809c6a2e	examples : add null threadpool args where needed (ggml/0) ggml-ci	2024-09-20 21:15:05 +03:00
Johannes Gäßler	5cb12f6839	CUDA: fix sum.cu compilation for CUDA < 11.7 (#9562 )	2024-09-20 18:35:35 +02:00
slaren	64c6af3195	ggml : fix n_threads_cur initialization with one thread (#9538 ) * ggml : fix n_threads_cur initialization with one thread * Update ggml/src/ggml.c --------- Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com>	2024-09-18 10:13:08 -07:00
Max Krasnyansky	0226613853	threadpool : skip polling for unused threads (#9461 ) * threadpool: skip polling for unused threads Currently all threads do N polling rounds even if only 1 thread is active (n_threads_cur == 1). This commit adds a check to skip the polling for unused threads (ith >= n_threads_cur). n_threads_cur is now an atomic_int to explicitly tell thread sanitizer that it is written from one thread and read from other threads (not a race conditions). * threadpool: further simplify and improve ggml_barrier Avoid using strict memory order while polling, yet make sure that all threads go through full memory barrier (memory fence) on ggml_barrier entrace and exit. * threads: add simple barrier test This test does lots of small, parallel matmul ops where the barriers in between dominate the overhead. * threadpool: improve thread sync for new-graphs Using the same tricks as ggml_barrier. All the polling is done with relaxed memory order to keep it efficient, once the new graph is detected we do full fence using read-modify-write with strict memory order. * threadpool: improve abort handling Do not use threadpool->ec (exit code) to decide whether to exit the compute loop. threadpool->ec is not atomic which makes thread-sanitizer rightfully unhappy about it. Instead introduce atomic threadpool->abort flag used for this. This is consistent with how we handle threadpool->stop or pause. While at it add an explicit atomic_load for n_threads_cur for consistency. * test-barrier: release threadpool before releasing the context fixes use-after-free detected by gcc thread-sanitizer on x86-64 for some reason llvm sanitizer is not detecting this issue.	2024-09-17 11:19:46 +03:00
slaren	23e0d70bac	ggml : move common CPU backend impl to new header (#9509 )	2024-09-16 16:22:07 +02:00
Michael Podvitskiy	a6a3a5c531	ggml : link MATH_LIBRARY not by its full path (#9339 )	2024-09-16 14:06:50 +03:00
Georgi Gerganov	19514d632e	cmake : do not hide GGML options + rename option (#9465 ) * cmake : do not hide GGML options ggml-ci * build : rename flag GGML_CUDA_USE_GRAPHS -> GGML_CUDA_GRAPHS for consistency ggml-ci	2024-09-16 10:27:50 +03:00
Eve	5c3d0f1824	ggml : IQ4_NL sgemm + Q4_0 AVX optimization (#9422 ) * squashed readd my iq4_nl sgemm PR https://github.com/ggerganov/llama.cpp/pull/8049 have ggml_vec_dot_q4_0 do two blocks per loop for avx try out f16c ggml_vec_dot_iq4_nl, but it's not really faster. as per https://github.com/ggerganov/llama.cpp/pull/8549 we can calculate several blocks at a time with no issue * shuffle * remove f16c iq4_nl as i cant make it faster than before	2024-09-16 09:48:24 +03:00
Georgi Gerganov	c4965a64f7	metal : handle zero-sized allocs (#9466 )	2024-09-16 09:05:56 +03:00
Georgi Gerganov	6262d13e0b	common : reimplement logging (#9418 ) https://github.com/ggerganov/llama.cpp/pull/9418	2024-09-15 20:46:12 +03:00
Michael Podvitskiy	6988da94a2	cmake : correct order of sycl flags (#9497 )	2024-09-15 19:55:52 +03:00
Michael Podvitskiy	7596487beb	cmake : try to fix sycl+intel build (#9487 )	2024-09-15 10:06:38 +03:00
Yuri Khrustalev	822b6322de	ggml : ggml_type_name return "NONE" for invalid values (#9458 ) When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.	2024-09-14 12:54:37 +03:00
Georgi Gerganov	1f4111e540	cmake : use list(APPEND ...) instead of set() + dedup linker (#9463 ) * cmake : use list(APPEND ...) instead of set() + dedup linker ggml-ci * cmake : try fix sycl * cmake : try to fix sycl 2 * cmake : fix sycl build (#9469) * try fix sycl build * use CMAKE_CXX_FLAGS as a string variable --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * one more CMAKE_CXX_FLAGS fix (#9471) --------- Co-authored-by: Michael Podvitskiy <podvitskiymichael@gmail.com>	2024-09-14 10:55:05 +03:00
Dou Xinpeng	e6b7801bd1	cann: Add host buffer type for Ascend NPU (#9406 ) * feat: Add host buffer type for Ascend NPU(CANN backend) * fix some checking errors * Add a few comments	2024-09-12 19:46:43 +08:00
Ahmad Tameem	2b00fa7997	riscv : modify Makefile and add a RISCV_VECT to print log info (#9442 ) - Added ggml_cpu_has_riscv_v() in GGML to print system info in log - Modified Makefile to only use flag when cross compiling for RISC-V	2024-09-12 14:24:31 +03:00
Georgi Gerganov	d6a04f872d	ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408 ) * ggml : hide ggml_object, ggml_cgraph, ggml_hash_set ggml-ci * ggml : add ggml-impl.h to backends * ggml : fix compiler warnings ggml-ci * ggml : add assert upon adding nodes	2024-09-12 14:23:49 +03:00
Xinpeng Dou	df4b7945ae	cann: Fix error when running a non-exist op (#9424 )	2024-09-12 09:02:35 +08:00
Johannes Gäßler	5af118efda	CUDA: fix --split-mode row race condition (#9413 )	2024-09-11 10:22:40 +02:00
R0CKSTAR	b34e023480	musa: remove Clang builtins mapping (#9421 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-09-11 03:46:55 +02:00
Alberto Cabrera Pérez	51b6038636	sycl : update support conditions (#9394 ) * sycl : update support condition to im2col Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com> * Added TODO to remind supporting FP32 im2col --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>	2024-09-11 08:53:42 +08:00
Georgi Gerganov	00ba2ff781	metal : fix compile warning with GGML_METAL_NDEBUG (#0 )	2024-09-10 10:17:43 +03:00
Radoslav Gerganov	293bebe077	rpc : fix segfault with nkvo (#9389 ) * rpc : fix nkvo * rpc : buf_size must not be static ref: #9337 --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-09-09 18:40:10 +03:00
Prashant Vithule	5fac4d5764	ggml : vector length agnostic SVE support (#9290 ) * Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths * Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths * Removed WhiteSpaces * ggml : style changes + fix 512-bit nb loop check - fix local scope in switch cases - consistent predicate names - empty lines when necessary - opening braces, spaces - const-correctness - add asserts * Update ggml/src/ggml-quants.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-09-09 18:37:18 +03:00
Johannes Gäßler	8e6e2fbe14	CUDA: fix variable name conflict for Windows build (#9382 )	2024-09-09 14:22:53 +02:00
Markus Tavenrath	daa9623ab0	Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. (#9118 ) * Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. * fix compile issues * Fix issues where the last submit wasn't executed or handled properly. * remove trailing whitespace * Repair GGML_VULKAN_CHECK_RESULTS * Increase submit counter only if actual work has been submitted and increase submit count to 100. * Fix some nodes are not checked with GGML_VULKAN_CHECK_RESULTS enabled.	2024-09-08 21:43:48 +02:00
Georgi Gerganov	e079bffb66	cuda : fix FA Q src index (1 -> 0) (#9374 )	2024-09-08 22:01:02 +03:00
Neo Zhang Jianyu	2a358fb0c4	[SYCL] add check malloc result on device (#9346 ) * add check malloc result on device * update for review comments, check all malloc_device() result --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-09-08 19:05:29 +08:00
Georgi Gerganov	a876861455	metal : update support condition for im2col + fix warning (#0 )	2024-09-08 11:05:55 +03:00
Salvatore Mesoraca	406c1a32a1	vulkan: add dryrun support to sin and cos ops (ggml/947) sin and cos failed test-backend-ops because they tried to dereference a context pointer that is null on dry runs. This commit prevents that segfault. Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>	2024-09-08 11:05:55 +03:00
Salvatore Mesoraca	9cb9260861	vulkan: correctly report support for OP_CONT (ggml/946) test-backend-ops fails because ggml_cont aborts when invoked passing an unsupported type. This commit makes ggml_cont tests pass Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>	2024-09-08 11:05:55 +03:00
Johannes Gäßler	202084d31d	tests: add gradient tests for all backends (ggml/932) * tests: add gradient checking to test-backend-ops * remove old comment * reorder includes * adjust SIN/COS parameters * add documentation, use supports_op if possible	2024-09-08 11:05:55 +03:00
Johannes Gäßler	dbbebcab33	ggml: fix ggml_graph_cpy undefined behavior (ggml/943)	2024-09-08 11:05:55 +03:00
Georgi Gerganov	ba1cf846ed	cann : fix doxy (ggml/0)	2024-09-08 11:05:55 +03:00
Mengqing Cao	d2d3200b38	cann : add Ascend NPU support (whisper/2336) * enable Ascend NPU in src/whisper.cpp * sync test-backend-ops with llama.cpp	2024-09-08 11:05:55 +03:00
Georgi Gerganov	51d964a4ef	cuda : mark BF16 CONT as unsupported	2024-09-08 11:05:55 +03:00
Salvatore Mesoraca	efe6a83e30	ggml : fix cont with transposed tensors when one dimension is 1 (ggml/934) * ggml_cont: fix issue with transposed tensors when one dimension is 1 when using multiple threads, it is not enough to check for the tensors to be contiguous for ggml_compute_forward_dup_same_cont to work correctly. The tensors strides also need to match. Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> * Add ggml_cont tests Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> * Remove dead code it isn't possible to reach this code because all these functions are invoked by ggml_compute_forward_dup if and only if src0->type != dst->type Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> * Make ggml_compute_forward_dup_same_cont work with contiguous tensors Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> --------- Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-09-08 11:05:55 +03:00
Eve	e536426ded	llamafile : disable sgemm for batch-size 1 (#9330 )	2024-09-07 22:02:26 +03:00

1 2 3 4

184 Commits