llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-02-04 15:43:53 +01:00

History

Jeff Bolz f095a649ec vulkan: get the first command buffer submitted sooner (#10499 ) This is an incremental improvement over #9118 to get work to the GPU a bit sooner. The first part is to start with a smaller number of nodes before the first submit, and ramp it up to the current 100 nodes/submit. The second part is to reduce the dryrun overhead for all the nodes that just need to request descriptor space. With these changes I get around 1-2% speedup on RTX 4070 combined with my old Haswell-era CPU.		2024-11-29 07:18:02 +01:00
..
include	ggml-cpu: support IQ4_NL_4_4 by runtime repack (#10541 )	2024-11-28 13:52:03 +01:00
src	vulkan: get the first command buffer submitted sooner (#10499 )	2024-11-29 07:18:02 +01:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	ggml : add support for dynamic loading of backends (#10469 )	2024-11-25 15:13:39 +01:00