Default Branch

c05e8c9934 · gguf-py: fixed local detection of gguf package (#11180) · Updated 2025-01-11 10:42:31 +01:00

Branches

33d7b70c88 · server : do not speculate during prompt processing · Updated 2024-12-03 09:58:43 +01:00

220
1

3c8a2a83fe · shmem experiments · Updated 2024-11-26 14:17:38 +01:00

283
3

dafedd33d2 · 4x4 -> 4x · Updated 2024-11-26 13:54:02 +01:00

283
2

bf3494345e · metal : some mul_mv experiments · Updated 2024-11-26 13:48:50 +01:00

283
1

b83cae088c · speculative : add infill mode · Updated 2024-11-26 10:14:17 +01:00

288
1

1ee6c482d0 · Merge branch 'master' into compilade/mamba2 · Updated 2024-11-25 18:06:56 +01:00

297
24

4ff0831ce6 · metal : use F16 math in mul_mat kernels · Updated 2024-11-25 14:15:26 +01:00

301
1

f7b0233eca · wip · Updated 2024-11-16 09:33:55 +01:00

363
1

12d5491db9 · ggml : fix some build issues · Updated 2024-11-15 20:20:54 +01:00

371
1

5e6dad9322 · speculative : experimenting with Qwen2.5 · Updated 2024-11-14 10:31:31 +01:00

385
2

33bdee667e · speculative : fix out-of-bounds access · Updated 2024-11-14 10:23:45 +01:00

385
1

8c1b186cb5 · metal : minor Q4_0 optimization · Updated 2024-11-12 14:30:51 +01:00

395
21

3d1fe1bb4d · metal : int -> short, style · Updated 2024-11-09 09:32:16 +01:00

406
2

bd1198a67a · metal : fix build and some more comments · Updated 2024-11-09 09:09:50 +01:00

406
1

a2385da59c · make : clean-up [no ci] · Updated 2024-11-08 12:46:20 +01:00

413
9

94accca4c2 · vec move mask to shmem · Updated 2024-11-07 19:58:10 +01:00

423
19

c5d8bb5a81 · leave only basic functions for SYCL CI · Updated 2024-11-06 08:47:50 +01:00

488
2

4fc8673d09 · llama-bench : skip repeated values in consecutive lines · Updated 2024-11-02 15:37:33 +01:00

448
1

20e12112fd · llama : suggest reduce ctx size when kv init fails · Updated 2024-11-02 00:55:19 +01:00

451
2

a20738644e · examples : add idle tool for investigating GPU idle overhead · Updated 2024-11-01 09:28:02 +01:00

463
1