slaren 2bf8d0f7c4
backend : offload large batches to GPU (#6083)
* backend : offload large batches to GPU

* fix hip

* code cleanup

* fix CUDA split buffers

* Update ggml-backend-impl.h

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* cuda : fix memset without set_device

* imatrix : remove sched affix from weight names

* sched : add a new split if the current one has too many inputs
reduce max inputs per split
more cleanup

* update backends

ggml-ci

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-03-18 11:03:04 +01:00
..
2024-03-14 20:29:32 +02:00
2024-02-16 11:31:07 +02:00
2024-02-16 11:31:07 +02:00
2024-02-16 11:31:07 +02:00
2024-03-15 18:53:53 +08:00
2024-02-16 11:31:07 +02:00
2023-03-29 20:21:09 +03:00
2023-08-30 09:29:32 +03:00
2024-03-07 11:41:53 +02:00