llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-31 14:13:09 +01:00

Author	SHA1	Message	Date
Andreas Kieslinger	750cb3e246	CUDA: rename macros to avoid conflicts with WinAPI (#10736 ) * Renames NVIDIA GPU-architecture flags to avoid name clashes with WinAPI. (e.g. CC_PASCAL, GPU architecture or WinAPI pascal compiler flag?) * Reverts erroneous rename in SYCL-code. * Renames GGML_CUDA_MIN_CC_DP4A to GGML_CUDA_CC_DP4A. * Renames the rest of the compute capability macros for consistency.	2024-12-10 18:23:24 +01:00
uvos	3ad5451f3b	Add some minimal optimizations for CDNA (#10498 ) * Add some minimal optimizations for CDNA * ggml_cuda: set launch bounds also for GCN as it helps there too	2024-11-27 17:10:08 +01:00
Diego Devesa	ae8de6d50a	ggml : build backends as libraries (#10256 ) * ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>	2024-11-14 18:04:35 +01:00
Johannes Gäßler	5af118efda	CUDA: fix --split-mode row race condition (#9413 )	2024-09-11 10:22:40 +02:00
slaren	2b1f616b20	ggml : reduce hash table reset cost (#8698 ) * ggml : reduce hash table reset cost * fix unreachable code warnings after GGML_ASSERT(false) * GGML_ASSERT(false) -> GGML_ABORT("fatal error") * GGML_ABORT use format string	2024-07-27 04:41:55 +02:00
Johannes Gäßler	69c487f4ed	CUDA: MMQ code deduplication + iquant support (#8495 ) * CUDA: MMQ code deduplication + iquant support * 1 less parallel job for CI build	2024-07-20 22:25:26 +02:00
Johannes Gäßler	808aba3916	CUDA: optimize and refactor MMQ (#8416 ) * CUDA: optimize and refactor MMQ * explicit q8_1 memory layouts, add documentation	2024-07-11 16:47:47 +02:00
Johannes Gäßler	8e558309dc	CUDA: MMQ support for iq4_nl, iq4_xs (#8278 )	2024-07-05 09:06:31 +02:00
Daniele	0a423800ff	CUDA: revert part of the RDNA1 optimizations (#8309 ) The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s	2024-07-05 09:06:09 +02:00
Johannes Gäßler	bcefa03bc0	CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (#8311 )	2024-07-05 09:05:34 +02:00
Daniele	d23287f122	Define and optimize RDNA1 (#8085 )	2024-07-04 01:02:58 +02:00
Johannes Gäßler	85a267daaa	CUDA: fix MMQ stream-k for --split-mode row (#8167 )	2024-06-27 16:26:05 +02:00
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00

13 Commits