1
0
mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-25 02:48:44 +01:00
Commit Graph

1421 Commits

Author SHA1 Message Date
Jared Van Bortel
a934b2cb8a vulkan : assert various kernel requirements 2023-11-23 17:22:00 -05:00
Jared Van Bortel
f194e1b6a6 Merge commit 'fcca0a700487999d52a525c96d6661e9f6a8703a' into nomic-vulkan 2023-11-23 17:21:59 -05:00
Jared Van Bortel
39abedd1d7 vulkan : optimize workgroup sizes 2023-11-23 17:18:48 -05:00
Jared Van Bortel
84f7fc4553 vulkan : rope n_past is now KQ_pos, f16 rope kernel 2023-11-23 17:18:42 -05:00
Jared Van Bortel
71565eb0c3 vulkan : replace ggml_diag_mask_inf with ggml_add (custom -inf mask) 2023-11-23 17:18:27 -05:00
Jared Van Bortel
af00cca08e Merge commit 'ec893798b7a2a803466cc8f063051499ec3d96f7' into HEAD 2023-11-08 16:36:00 -05:00
Jared Van Bortel
c438c16896 fix build with external fmtlib (v10)
Co-authored-by: ToKiNoBug <tokinobug@163.com>
2023-11-08 16:31:29 -05:00
Jared Van Bortel
a8cac53207 kompute : fix issues with debug layers 2023-11-08 16:31:29 -05:00
cebtenzzre
f88b198885 llama : fix Vulkan whitelist () 2023-11-03 17:22:22 -04:00
Adam Treat
ffd0624be2 Remove this debug code. 2023-11-03 17:22:22 -04:00
Adam Treat
a5eb001eab Revert the prompt processing on gpu for now.
Fixes issues  and 
2023-11-03 17:22:22 -04:00
Adam Treat
e006d377dd Scale the workgroup count down to allow correct generation for falcon with
AMD radeon cards with lower workgroup count limit

Partially fixes 
2023-11-03 17:22:22 -04:00
cebtenzzre
89b71278ff llama : decide to disable Vulkan before loading tensors () 2023-11-03 17:22:22 -04:00
cebtenzzre
1c17010188 vulkan : fix missing break in matmul selection () 2023-11-03 17:22:22 -04:00
Adam Treat
74ddf0f17d Fix synchronization problem for AMD Radeon with amdvlk driver or windows
drivers. Does not have any performance or fidelity effect on other gpu/driver
combos I've tested.

FIXES: https://github.com/nomic-ai/gpt4all/issues/1507
2023-11-03 17:22:22 -04:00
Adam Treat
8d9efbf97a Lower the workgroup count for some shaders by providing a loop that processes
four floats at a time.
2023-11-03 17:22:22 -04:00
Adam Treat
752f7ebd61 Remove unused push constant that was giving validation errors. 2023-11-03 17:22:22 -04:00
Adam Treat
8400015337 Don't try an allocation on a heap that is smaller than the size we require. 2023-11-03 17:22:22 -04:00
cebtenzzre
cbc0d1af79 kompute : make scripts executable 2023-11-03 17:22:22 -04:00
cebtenzzre
21841d3163 kompute : enable kp_logger and make it static () 2023-11-03 17:22:22 -04:00
Aaron Miller
cc05a602d6 use mat*vec shaders for mat*mat
I wrote the mat*mat shaders from scratch so I understand them better but
they are currently not faster than just multiply-invoking the mat*vec
shaders, by a significant degree - so, except for f32 which needed a new
shader, revert to the m*v ones here.
2023-11-03 17:22:22 -04:00
Aaron Miller
c1fd64548d attempted speedups 2 2023-11-03 17:22:22 -04:00
Aaron Miller
9bc52ebae3 attempted speedups 2023-11-03 17:22:22 -04:00
Aaron Miller
8dc79ac380 clean up vulkan/cpu switch 2023-11-03 17:22:22 -04:00
Aaron Miller
cd0257ed0d q4_1 mat*mat 2023-11-03 17:22:22 -04:00
Aaron Miller
4809890d80 rm commented dbg print 2023-11-03 17:22:22 -04:00
Aaron Miller
b78a94bc6d q6k mm works 2023-11-03 17:22:22 -04:00
Aaron Miller
d5741c07a5 use op param epsilon for norms 2023-11-03 17:22:22 -04:00
Aaron Miller
3327d84a7f perf: use bigger threadgroups in mm 2023-11-03 17:22:22 -04:00
Aaron Miller
46385ee0d5 misc vulkan cleanup
make pushconts consistent w/ dispatch, avoid a double free
2023-11-03 17:22:22 -04:00
Aaron Miller
f0cd38b9ad add mat*mat ops 2023-11-03 17:22:22 -04:00
Adam Treat
09d83f0401 Delete TODO now that we have q8_0. 2023-11-03 17:22:22 -04:00
Aaron Miller
8564f79036 falcon h2d + reenable vulkan 2023-11-03 17:22:22 -04:00
Aaron Miller
020b1745a0 vulkan: implement neox mode for rope 2023-11-03 17:22:21 -04:00
Aaron Miller
ff4212d20f q8 mat*vec 2023-11-03 17:22:21 -04:00
Aaron Miller
9db90cbe12 f16 mv broadcasting fix (gqa fix) 2023-11-03 17:22:21 -04:00
Cebtenzzre
3d850db767 kompute : remove Q6_K from list of supported quant types 2023-11-03 17:22:21 -04:00
Cebtenzzre
24a4a5956a kompute : only try to use Vulkan for LLaMA itself 2023-11-03 17:22:21 -04:00
Adam Treat
bc4b5ed1cb Fixes for subgroup size to bring AMD and NVIDIA inline with eachother for all kernels. 2023-11-03 17:22:21 -04:00
Adam Treat
de589ced7c Change this back to be in agreement with metal and our previous softmax kernel. 2023-11-03 17:22:21 -04:00
Adam Treat
6ac39752bf Fixup the upstream CMakelists.txt so we can build just llama.cpp with our branch. 2023-11-03 17:22:21 -04:00
Adam Treat
32289aa447 Fixes for norm. 2023-11-03 17:22:21 -04:00
Adam Treat
06d4b21598 Fix offset into the qh and now we have working vulkan accelerated for gguff'd llama. 2023-11-03 17:22:21 -04:00
Adam Treat
f1c9bc1821 Add q6_k getrows and mul*vec kernel. 2023-11-03 17:22:21 -04:00
Adam Treat
4b223ec432 Refactor getrows to use common code and get ready for q6_k. 2023-11-03 17:22:21 -04:00
Adam Treat
5509f74318 Minor cleanup. 2023-11-03 17:22:21 -04:00
Adam Treat
601905e75e Move the subgroups and printf into common. 2023-11-03 17:22:21 -04:00
Adam Treat
93306f16d0 Consolidate code for mat x vec kernels and use subgroups more extensively. 2023-11-03 17:22:21 -04:00
Adam Treat
77135a3bf5 Add a common boilerplate code via include and elim copy pasta 2023-11-03 17:22:21 -04:00
Adam Treat
9e4f8b4acc Upload immediately to device. 2023-11-03 17:22:21 -04:00