Default Branch

9ba399dfa7 · server : add support for "encoding_format": "base64" to the */embeddings endpoints (#10967) · Updated 2024-12-24 21:33:04 +01:00

Branches

1ad42b1f1e · ggml : ggml_soft_max uses F16 mask · Updated 2024-01-31 19:33:59 +01:00

2355
36

719a087138 · iq3_xxs: forgotten update of the grid points · Updated 2024-01-30 17:39:07 +01:00

2369
1

2bf91c5306 · metal : clean up · Updated 2024-01-25 12:29:45 +01:00

2465
23

6ccbd1777a · wip · Updated 2024-01-24 14:45:04 +01:00

2465
18

da23b56f25 · wip : no ic 8 step · Updated 2024-01-24 12:25:34 +01:00

2465
18

06c2d0d117 · wip · Updated 2024-01-23 21:42:43 +01:00

2465
14

a9681febd6 · ggml : online attention (CPU) · Updated 2024-01-20 15:45:41 +01:00

2465
4

32a392fe68 · try a differerent fix · Updated 2024-01-19 23:10:23 +01:00

2466
2

4a3bc1522e · py : linting with mypy and isort · Updated 2024-01-19 21:18:58 +01:00

2467
3

1453215165 · kompute : fix ggml_add kernel · Updated 2024-01-18 23:09:16 +01:00

2583
105

ccc78a200e · hellaswag: speed up even more by parallelizing log-prob evaluation · Updated 2024-01-18 17:25:29 +01:00

2483
1

2917e6b528 · Merge branch 'master' into gg/imatrix-gpu-4931 · Updated 2024-01-17 17:43:45 +01:00

2490
10

23742deb5b · py : fix padded dummy tokens (I hope) · Updated 2024-01-17 14:44:22 +01:00

2509
4

9fd1e83f6d · Use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 · Updated 2024-01-17 11:16:08 +01:00

2495
1

49bafe0986 · tests : avoid creating RNGs for each tensor · Updated 2024-01-17 09:40:55 +01:00

2498
6

bb9abb5cd8 · imatrix: guard Q4_0/Q5_0 against ffn_down craziness · Updated 2024-01-16 08:56:05 +01:00

2512
2

9998ecd191 · llama : add phixtral support (wip) · Updated 2024-01-13 13:24:07 +01:00

2542
1

1fb563ebdc · py : try to fix flake stuff · Updated 2024-01-13 12:42:35 +01:00

2543
2

9bfcb16fd3 · Add llama enum for IQ2_XS · Updated 2024-01-11 17:24:12 +01:00

2592
11

24096933b0 · server : try to fix infill when prompt is empty · Updated 2024-01-09 10:27:29 +01:00

2594
1