Default Branch

c05e8c9934 · gguf-py: fixed local detection of gguf package (#11180) · Updated 2025-01-11 10:42:31 +01:00

Branches

75b3a09602 · test-backend-ops : add TQ1_0 and TQ2_0 comments for later · Updated 2024-09-04 21:00:21 +02:00

795
33

f648ca2cee · llama : add llama_sampling API + move grammar in libllama · Updated 2024-09-03 09:31:54 +02:00

802
1

40fa68cb46 · readme : add API change notice · Updated 2024-09-02 17:32:24 +02:00

811
3

375de5b1f8 · llama : use unused n_embd_k_gqa in k_shift · Updated 2024-09-02 03:59:24 +02:00

811
41

a95225cdfd · metal : another fix for the fa kernel · Updated 2024-08-26 14:08:38 +02:00

835
1

aa931d0375 · metal : fix fa kernel · Updated 2024-08-26 12:09:50 +02:00

835
1

6494509801 · backup · Updated 2024-08-26 10:58:54 +02:00

845
2

ccb45186d0 · docs : remove references · Updated 2024-08-26 08:52:17 +02:00

839
2

8062650343 · llama : fix simple splits when the batch contains embeddings · Updated 2024-08-21 21:09:03 +02:00

850
19

9127800d83 · wip · Updated 2024-08-17 01:51:06 +02:00

883
2

62d7b6c87f · cuda : re-add q4_0 · Updated 2024-08-14 12:37:03 +02:00

879
3

93ec58b932 · server : fix typo in comment · Updated 2024-08-14 04:12:26 +02:00

881
4

faaac59d16 · llama : support NUL bytes in tokens · Updated 2024-08-12 03:00:03 +02:00

892
1

73bc9350cd · gguf-py : Numpy dequantization for grid-based i-quants · Updated 2024-08-10 05:47:31 +02:00

912
2

9329953a61 · llama : avoid double tensor copy when saving session to buffer · Updated 2024-08-07 22:03:34 +02:00

920
2

7764ab911d · update guide · Updated 2024-08-07 16:01:02 +02:00

921
1

cad8abb49b · add tool to allow plotting tensor allocation maps within buffers · Updated 2024-08-06 22:09:51 +02:00

929
1

6e299132e7 · clip : style changes · Updated 2024-08-06 10:44:29 +02:00

1253
56

16dab13bde · correct cmd name · Updated 2024-08-05 18:15:33 +02:00

938
1

bddcc5f985 · llama : better replace_all · Updated 2024-08-04 12:42:08 +02:00

954
1