llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-10 12:30:50 +01:00

History

ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156 )

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for FP32 datatype.

This change results in a consistent 90%
improvement in input processing time, and 20%
to 80% improvement in output processing time,
across various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>

2024-11-09 09:17:50 +02:00

cmake

llama : reorganize source code + improve CMake (#8006 )

2024-06-26 18:33:02 +03:00

include

metal : optimize FA kernels (#10171 )

2024-11-08 13:47:22 +02:00

src

ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156 )

2024-11-09 09:17:50 +02:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

metal : opt-in compile flag for BF16 (#10218 )

2024-11-08 21:59:46 +02:00