llama.cpp/ggml
Jeff Bolz f095a649ec
vulkan: get the first command buffer submitted sooner (#10499)
This is an incremental improvement over #9118 to get work to the GPU a bit
sooner. The first part is to start with a smaller number of nodes before
the first submit, and ramp it up to the current 100 nodes/submit. The
second part is to reduce the dryrun overhead for all the nodes that just
need to request descriptor space.

With these changes I get around 1-2% speedup on RTX 4070 combined with my
old Haswell-era CPU.
2024-11-29 07:18:02 +01:00
..
include ggml-cpu: support IQ4_NL_4_4 by runtime repack (#10541) 2024-11-28 13:52:03 +01:00
src vulkan: get the first command buffer submitted sooner (#10499) 2024-11-29 07:18:02 +01:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt ggml : add support for dynamic loading of backends (#10469) 2024-11-25 15:13:39 +01:00