llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-10 12:30:50 +01:00

History

ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763 )

* ggml: Add POOL2D OP for GPU ACC to the Vulkan.

- The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend.
- A GGML_OP_POOL_2D shader has been added. (Pooling)
- The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

* [fix] Correct the incorrect order of the parameters.

fix casting to int.

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

---------

Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

2024-10-29 09:52:56 +01:00

cmake

llama : reorganize source code + improve CMake (#8006 )

2024-06-26 18:33:02 +03:00

include

[CANN] Adapt to dynamically loadable backends mechanism (#9970 )

2024-10-22 16:16:01 +08:00

src

ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763 )

2024-10-29 09:52:56 +01:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

add amx kernel for gemm (#8998 )

2024-10-18 13:34:36 +08:00