llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-10 12:30:50 +01:00

History

cuda : improve cuda pool efficiency using virtual memory (#4606 )

* cuda : improve cuda pool efficiency using virtual memory

* fix mixtral

* fix cmake build

* check for vmm support, disable for hip

ggml-ci

* fix hip build

* clarify granularity

* move all caps to g_device_caps

* refactor error checking

* add cuda_pool_alloc, refactor most pool allocations

ggml-ci

* fix hip build

* CUBLAS_TF32_TENSOR_OP_MATH is not a macro

* more hip crap

* llama : fix msvc warnings

* ggml : fix msvc warnings

* minor

* minor

* cuda : fallback to CPU on host buffer alloc fail

* Update ggml-cuda.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Update ggml-cuda.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* ensure allocations are always aligned

* act_size -> actual_size

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

2023-12-24 14:34:22 +01:00

CMakeLists.txt

sync : ggml (new ops, tests, backend, etc.) (#4359 )

2023-12-07 22:26:54 +02:00

test-backend-ops.cpp

ggml : change ggml_scale to take a float instead of tensor (#4573 )

2023-12-21 23:20:49 +02:00

test-c.c

tests : add a C compliance test (#2848 )

2023-08-30 09:20:26 +03:00

test-double-float.cpp

ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861 )