Default Branch

9ba399dfa7 · server : add support for "encoding_format": "base64" to the */embeddings endpoints (#10967) · Updated 2024-12-24 21:33:04 +01:00

Branches

d857e5192e · quantize : check imatrix for nan/inf values · Updated 2024-06-06 22:44:24 +02:00

1290
2

731e7528be · server : fix --threads-http arg · Updated 2024-06-06 15:37:12 +02:00

1291
1

f7d4b7c343 · build only main and server in their docker images · Updated 2024-06-06 00:13:01 +02:00

1298
2

3d2e79da7f · add openmp lib to dockerfiles · Updated 2024-06-06 00:05:25 +02:00

1298
1

0085f94936 · server : add /v1/completion endpoint · Updated 2024-06-04 14:58:14 +02:00

1308
1

5f8720fb7b · add rpc-server to Makefile · Updated 2024-05-31 17:22:05 +02:00

1343
3

956af1552a · server : update js · Updated 2024-05-31 14:47:19 +02:00

1334
1

77c16ee0d4 · tests : disable json test due to lack of python on the CI node · Updated 2024-05-31 13:16:54 +02:00

1347
3

d32a8f6142 · backup · Updated 2024-05-31 10:51:56 +02:00

1344
2

8a8f8b953f · llama : print a log of the total cache size · Updated 2024-05-29 20:45:43 +02:00

1353
4

1ca802a3e0 · parallelize fattn compilation test · Updated 2024-05-28 01:19:36 +02:00

1379
6

ddc59e8e0a · wipwipwiwpip · Updated 2024-05-27 11:04:09 +02:00

1401
17

4b1770109c · Fix q_xxs using mul_mat_q · Updated 2024-05-27 10:46:37 +02:00

1384
1

1c6cde92bb · metal : disable FA kernel for HS=256 · Updated 2024-05-27 08:57:20 +02:00

1386
1

11f78c6a2d · convert-hf : adapt ArcticModel to use yield too · Updated 2024-05-25 18:52:53 +02:00

1393
4

dd14d818e0 · Update main-intel.Dockerfile base image to 2024.1.0 · Updated 2024-05-24 04:47:58 +02:00

1404
1

c5fe1d6cdc · gguf-py : remove unused import · Updated 2024-05-23 06:09:49 +02:00

1419
2

518b75260b · cuda uma test · Updated 2024-05-23 03:13:48 +02:00

1419
1

e9095e6098 · async direct io per tensor test · Updated 2024-05-22 01:08:52 +02:00

1438
3

a041ced0fd · wip · Updated 2024-05-20 17:20:49 +02:00

1444
1