llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-28 15:18:26 +01:00

History

Jeff Bolz f095a649ec vulkan: get the first command buffer submitted sooner (#10499 ) This is an incremental improvement over #9118 to get work to the GPU a bit sooner. The first part is to start with a smaller number of nodes before the first submit, and ramp it up to the current 100 nodes/submit. The second part is to reduce the dryrun overhead for all the nodes that just need to request descriptor space. With these changes I get around 1-2% speedup on RTX 4070 combined with my old Haswell-era CPU.		2024-11-29 07:18:02 +01:00
..
vulkan-shaders	vulkan: define all quant data structures in types.comp (#10440 )	2024-11-27 08:32:54 +01:00
CMakeLists.txt	ggml : add support for dynamic loading of backends (#10469 )	2024-11-25 15:13:39 +01:00
ggml-vulkan.cpp	vulkan: get the first command buffer submitted sooner (#10499 )	2024-11-29 07:18:02 +01:00