llama.cpp/tests at b1842 - llama.cpp - Gitea: Git with a cup of tea

Mirrors/llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-10 12:30:50 +01:00

History

Kawrakow 49662cbed3

ggml : SOTA 2-bit quants (add IQ2_XS) (#4856 )

* iq2_xs: basics

* iq2_xs: this should have been in the basics

* iq2_xs: CUDA and scalar CPU works

* iq2_xs: WIP Metal

* iq2_xs: Metal now works

* iq2_xs: working, but dog slow, ARM_NEON dot product

* iq2_xs: better ARM_NEON dot product

We are now at 19.5 t/s for TG-128 and 61 t/s for PP-512 when
running on the CPU.

* iq2_xs: AVX2 dot product - 19.5 t/s

* iq2_xs: faster AVX2 dit product

21.4 t/s for TG-128, 59.2 t/s for PP-512.
The latter is 2x compared to the previous version.

* iq2_xs: had forgotten to delete iq2-data.h

* Add llama enum for IQ2_XS

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2024-01-11 21:39:39 +02:00

..

CMakeLists.txt

cmake : fix ld warning duplicate libraries libllama.a (#4671 )

2023-12-29 16:39:15 +02:00

test-backend-ops.cpp

CUDA: faster softmax via shared memory + fp16 math (#4742 )

2024-01-09 08:58:55 +01:00

test-c.c

tests : add a C compliance test (#2848 )

2023-08-30 09:20:26 +03:00

test-double-float.cpp

ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861 )

2023-10-30 19:19:15 +02:00

test-grad0.cpp

cuda : improve cuda pool efficiency using virtual memory (#4606 )

2023-12-24 14:34:22 +01:00

test-grammar-parser.cpp

gguf : new file format with flexible meta data (beta) (#2398 )

2023-08-21 23:07:43 +03:00

test-llama-grammar.cpp

gguf : new file format with flexible meta data (beta) (#2398 )

2023-08-21 23:07:43 +03:00

test-opt.cpp

sync : ggml (backend v2) (#3912 )

2023-11-13 14:16:23 +02:00

test-quantize-fns.cpp

ggml : SOTA 2-bit quants (add IQ2_XS) (#4856 )

2024-01-11 21:39:39 +02:00

test-quantize-perf.cpp

ggml : use ggml_row_size where possible (#4472 )

2023-12-14 20:05:21 +01:00

test-rope.cpp

llama : custom attention mask + parallel decoding + no context swaps (#3228 )

2023-09-28 19:04:36 +03:00

test-sampling.cpp

sampling : refactor init to use llama_sampling_params (#3696 )

2023-10-20 21:07:23 +03:00

test-tokenizer-0-falcon.cpp

Minor improvements in GPT2 tokenizer (#3567 )

2023-10-10 18:59:52 +02:00

test-tokenizer-0-falcon.py

ci : add flake8 to github actions (python linting) (#4129 )

2023-11-20 11:35:47 +01:00

test-tokenizer-0-llama.cpp

Minor improvements in GPT2 tokenizer (#3567 )

2023-10-10 18:59:52 +02:00

test-tokenizer-0-llama.py

ci : add flake8 to github actions (python linting) (#4129 )

2023-11-20 11:35:47 +01:00

test-tokenizer-1-bpe.cpp

Add more tokenizer tests (#3742 )

2023-10-24 09:17:17 +02:00

test-tokenizer-1-llama.cpp

Work on the BPE tokenizer (#3252 )

2023-10-03 09:16:26 +02:00