llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-28 04:47:04 +01:00

History

Max Krasnyansky 0226613853 threadpool : skip polling for unused threads (#9461 ) * threadpool: skip polling for unused threads Currently all threads do N polling rounds even if only 1 thread is active (n_threads_cur == 1). This commit adds a check to skip the polling for unused threads (ith >= n_threads_cur). n_threads_cur is now an atomic_int to explicitly tell thread sanitizer that it is written from one thread and read from other threads (not a race conditions). * threadpool: further simplify and improve ggml_barrier Avoid using strict memory order while polling, yet make sure that all threads go through full memory barrier (memory fence) on ggml_barrier entrace and exit. * threads: add simple barrier test This test does lots of small, parallel matmul ops where the barriers in between dominate the overhead. * threadpool: improve thread sync for new-graphs Using the same tricks as ggml_barrier. All the polling is done with relaxed memory order to keep it efficient, once the new graph is detected we do full fence using read-modify-write with strict memory order. * threadpool: improve abort handling Do not use threadpool->ec (exit code) to decide whether to exit the compute loop. threadpool->ec is not atomic which makes thread-sanitizer rightfully unhappy about it. Instead introduce atomic threadpool->abort flag used for this. This is consistent with how we handle threadpool->stop or pause. While at it add an explicit atomic_load for n_threads_cur for consistency. * test-barrier: release threadpool before releasing the context fixes use-after-free detected by gcc thread-sanitizer on x86-64 for some reason llvm sanitizer is not detecting this issue.		2024-09-17 11:19:46 +03:00
..
ggml-cann	cann : fix doxy (ggml/0)	2024-09-08 11:05:55 +03:00
ggml-cuda	CUDA: fix --split-mode row race condition (#9413 )	2024-09-11 10:22:40 +02:00
ggml-sycl	[SYCL] Fix DMMV dequantization (#9279 )	2024-09-04 16:26:33 +01:00
kompute@4565194ed7	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
kompute-shaders	ggml : move rope type enum to ggml.h (#8949 )	2024-08-13 21:13:15 +02:00
llamafile	ggml : move common CPU backend impl to new header (#9509 )	2024-09-16 16:22:07 +02:00
vulkan-shaders	Improve Vulkan shader build system (#9239 )	2024-09-06 08:56:17 +02:00
CMakeLists.txt	ggml : link MATH_LIBRARY not by its full path (#9339 )	2024-09-16 14:06:50 +03:00
ggml-aarch64.c	ggml : move common CPU backend impl to new header (#9509 )	2024-09-16 16:22:07 +02:00
ggml-aarch64.h	ggml : minor naming changes (#8433 )	2024-07-12 10:46:02 +03:00
ggml-alloc.c	ggml : reduce hash table reset cost (#8698 )	2024-07-27 04:41:55 +02:00
ggml-backend-impl.h	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
ggml-backend.c	tests: add gradient tests for all backends (ggml/932)	2024-09-08 11:05:55 +03:00
ggml-blas.cpp	ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408 )	2024-09-12 14:23:49 +03:00
ggml-cann.cpp	cann: Add host buffer type for Ascend NPU (#9406 )	2024-09-12 19:46:43 +08:00
ggml-common.h	ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151 )	2024-09-05 21:48:47 -04:00
ggml-cpu-impl.h	ggml : move common CPU backend impl to new header (#9509 )	2024-09-16 16:22:07 +02:00
ggml-cuda.cu	ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408 )	2024-09-12 14:23:49 +03:00
ggml-impl.h	ggml : move common CPU backend impl to new header (#9509 )	2024-09-16 16:22:07 +02:00
ggml-kompute.cpp	ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408 )	2024-09-12 14:23:49 +03:00
ggml-metal.m	metal : handle zero-sized allocs (#9466 )	2024-09-16 09:05:56 +03:00
ggml-metal.metal	sync : ggml	2024-08-27 22:41:27 +03:00
ggml-quants.c	ggml : move common CPU backend impl to new header (#9509 )	2024-09-16 16:22:07 +02:00
ggml-quants.h	ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151 )	2024-09-05 21:48:47 -04:00
ggml-rpc.cpp	ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408 )	2024-09-12 14:23:49 +03:00
ggml-sycl.cpp	ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408 )	2024-09-12 14:23:49 +03:00
ggml-vulkan.cpp	ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408 )	2024-09-12 14:23:49 +03:00
ggml.c	threadpool : skip polling for unused threads (#9461 )	2024-09-17 11:19:46 +03:00