llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-10 04:20:24 +01:00

History

Georgi Gerganov 77a73403ca

ggml : add new Q4_2 quantization (ARM only) (#1046 )

* ggml : Q4_2 ARM

* ggml : add ggml_is_quantized()

* llama : update llama_type_name() with Q4_2 entry

* ggml : speed-up q4_2

- 4 threads: ~100ms -> ~90ms
- 8 threads:  ~55ms -> ~50ms

* ggml : optimize q4_2 using vmlaq_n_f32 + vmulq_n_f32

2023-04-18 23:54:57 +03:00

benchmark

benchmark : fix result validation in benchmark-q4_0-matmult (#987 )

2023-04-15 08:51:54 +03:00

embedding

examples: add missing <ctime> include for time() (#1011 )

2023-04-16 10:13:00 +00:00

main

Add LoRA support (#820 )

2023-04-17 17:28:55 +02:00

perplexity

Add LoRA support (#820 )

2023-04-17 17:28:55 +02:00

quantize

ggml : add new Q4_2 quantization (ARM only) (#1046 )

2023-04-18 23:54:57 +03:00

quantize-stats

quantize-stats : fix bug in --type argument

2023-04-17 17:31:06 +03:00

alpaca.sh

examples : add -n to alpaca and gpt4all scripts (#706 )

2023-04-13 16:03:39 +03:00

chat-13B.bat

Create chat-13B.bat (#592 )

2023-03-29 20:21:09 +03:00

chat-13B.sh

Move chat scripts into "./examples"

2023-03-25 20:37:09 +02:00

chat.sh

If n_predict == -1, generate forever

2023-03-25 21:51:41 +02:00

CMakeLists.txt

Add quantize-stats command for testing quantization (#728 )

2023-04-08 00:09:18 +02:00

common.cpp

Add LoRA support (#820 )

2023-04-17 17:28:55 +02:00

common.h

Add LoRA support (#820 )

2023-04-17 17:28:55 +02:00

gpt4all.sh

examples : add -n to alpaca and gpt4all scripts (#706 )

2023-04-13 16:03:39 +03:00

Miku.sh

Fix whitespace, add .editorconfig, add GitHub workflow (#883 )

2023-04-11 19:45:44 +00:00

reason-act.sh

add example of re-act pattern (#583 )

2023-03-29 10:10:24 -05:00