Kawrakow
d59bd97065
Guard against all weights in a super-block being zero ( #3010 )
...
* Guard against all weights in a super-block being zero
* Also guard against extremely small weights
Closes #2982
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
b1181
2023-09-05 09:55:33 +02:00
Georgi Gerganov
35938ee3b0
llama : update logic for number of threads when using BLAS
b1180
2023-09-05 10:46:39 +03:00
Georgi Gerganov
921772104b
speculative : add grammar support ( #2991 )
...
* speculative : add grammar support
* grammars : add json_arr.gbnf
* grammar : add comments to new grammar file
* grammar : remove one nested level
* common : warm-up with 2 tokens - seems to work better
* speculative : print draft token pieces
* speculative : reuse grammar parser + better logs and comments
* speculative : avoid grammar_mem
* make : fix speculative build
b1179
2023-09-05 08:46:17 +03:00
Georgi Gerganov
2ba85c8609
py : minor
2023-09-04 22:50:50 +03:00
Georgi Gerganov
e36ecdccc8
build : on Mac OS enable Metal by default ( #2901 )
...
* build : on Mac OS enable Metal by default
* make : try to fix build on Linux
* make : move targets back to the top
* make : fix target clean
* llama : enable GPU inference by default with Metal
* llama : fix vocab_only logic when GPU is enabled
* common : better `n_gpu_layers` assignment
* readme : update Metal instructions
* make : fix merge conflict remnants
* gitignore : metal
b1177
2023-09-04 22:26:24 +03:00
slaren
bd33e5ab92
ggml-opencl : store GPU buffer in ggml_tensor::extra ( #2994 )
b1176
2023-09-04 14:59:52 +02:00
Cebtenzzre
3103568144
llama-bench : make cpp file non-executable ( #2999 )
b1175
2023-09-04 13:40:18 +03:00
Leng Yue
5b8530d88c
make : add speculative example ( #3003 )
b1174
2023-09-04 13:39:57 +03:00
Aarni Koskela
e4386f417f
server : add a subtle loading animation to the edit box ( #2466 )
...
* editorconfig: add override for the server HTML (which already is 2-space indented)
* server: add a subtle loading animation to the edit box
b1173
2023-09-04 16:28:55 +08:00
Jiahao Li
35195689cd
2x faster (rms) norm cuda kernels (3.7% e2e improvement) ( #2985 )
...
* 2x faster (rms) norm cuda kernels
* Fix code style
b1172
2023-09-04 08:53:30 +02:00
slaren
cf9b08485c
ggml-alloc : use virtual memory for measurement ( #2973 )
...
* ggml-alloc : use virtual memory for measurement
* compatibility fixes for MAP_ANONYMOUS
* fallback to fixed address for systems without virtual memory
b1171
2023-09-03 20:34:09 +02:00
Georgi Gerganov
47068e5170
speculative : PoC for speeding-up inference via speculative sampling ( #2926 )
...
* speculative : initial example
* speculative : print encoding speed
* speculative : add --draft CLI arg
b1170
2023-09-03 15:12:08 +03:00
Georgi Gerganov
8f429fa511
perplexity : fix ETA by warming up the model with an empty run
b1169
2023-09-03 13:43:17 +03:00
Kerfuffle
6519e9c99c
gguf(python): Fix special vocab handling when id < 0 ( #2984 )
2023-09-03 04:38:43 -06:00
Georgi Gerganov
b7f2aa9e51
metal : restore 363f0bf and fix reduce in F16_F32 kernels ( #2986 )
2023-09-03 13:23:33 +03:00
Alon
73a12a6344
cov : disable comment in PRs ( #2989 )
2023-09-03 13:19:01 +03:00
opparco
3730134776
llama : fix bpe tokenize from byte ( #2889 )
b1165
2023-09-03 13:18:09 +03:00
Georgi Gerganov
d9151e6f57
metal : revert 6af0bab until we fix it
...
This restores the generated text to be the same as before #2959
2023-09-03 12:40:56 +03:00
Alon
afc43d5f82
cov : add Code Coverage and codecov.io integration ( #2928 )
...
* update .gitignore
* makefile: add coverage support (lcov, gcovr)
* add code-coverage workflow
* update code coverage workflow
* wun on ubuntu 20.04
* use gcc-8
* check why the job hang
* add env vars
* add LLAMA_CODE_COVERAGE=1 again
* - add CODECOV_TOKEN
- add missing make lcov-report
* install lcov
* update make file -pb flag
* remove unused GGML_NITER from workflows
* wrap coverage output files in COV_TARGETS
b1163
2023-09-03 11:48:49 +03:00
Wentai Zhang
6460f758db
opencl : fix a bug in ggml_cl_pool_malloc() for ggml_cl_mul_mat_f32() ( #2955 )
...
Co-authored-by: Wentai Zhang <wentaizhang@tencent.com>
b1162
2023-09-03 11:46:44 +03:00
Kawrakow
ca82cf7bac
metal : more optimizations ( #2959 )
...
* Very minor speedup via simd-group synchronization in f16 x f32
* Another very minor speedup on metal
* Quite significant PP speedup on metal
* Another attempt
* Minor
* Massive improvement for TG for fp16
* ~4-5% improvement for Q8_0 TG on metal
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-03 11:06:22 +03:00
kchro3
6a31a3bd98
swift : add support for k-quants ( #2983 )
2023-09-03 09:21:05 +03:00
Kerfuffle
cff7b0bf07
convert.py : BPE fixes ( #2938 )
...
* convert.py: BPE fixes?
* Remove unnecessary conditional in addl token error handling
2023-09-03 08:52:13 +03:00
Ido S
340af42f09
docs : add catai
to README.md
( #2967 )
2023-09-03 08:50:51 +03:00
momonga
c42f0ec6b3
examples : fix gpt-neox ( #2943 )
...
Co-authored-by: mmnga <mmnga1mmnga@gmail.com>
b1157
2023-09-03 08:36:28 +03:00
kchro3
2753415afd
swift : add missing c file to Package.swift ( #2978 )
2023-09-03 08:27:25 +03:00
Cebtenzzre
bc054af97a
make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS ( #2886 )
...
* make : remove unused -DGGML_BIG_ENDIAN
* make : put preprocessor stuff in CPPFLAGS
* make : pass Raspberry Pi arch flags to g++ as well
* make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS
* make : fix inverted conditional
b1155
2023-09-03 08:26:59 +03:00
Kerfuffle
3358c381f6
logging: Fix creating empty file even when disabled ( #2966 )
...
* logging: Fix creating empty file even when disabled
* Minor formatting fix
Co-authored-by: staviq <staviq@gmail.com>
---------
Co-authored-by: staviq <staviq@gmail.com>
b1154
2023-09-02 11:53:55 -06:00
bandoti
52315a4216
readme : update clblast instructions ( #2903 )
...
* Update Windows CLBlast instructions
* Update Windows CLBlast instructions
* Remove trailing whitespace
2023-09-02 15:53:18 +03:00
Karsten Weiss
8b56b4f2c3
metal : show all Metal device instances in the system ( #2952 )
...
* ggml_metal_init: Show all Metal device instances in the system
Also show the default Metal device that was picked.
* Update ggml-metal.m
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-02 15:29:09 +03:00
Jhen-Jie Hong
21f3d1be86
k-quants : fix build on armv7 (android only) ( #2920 )
...
* k-quants : fix build on armv7
* ggml : cleanup unused arm32 specific impl
* k-quants : avoid some unused vzero / mzero define
* ggml-alloc : use 4g for MEASURE_MAX_SIZE in 32-bit arm
b1151
2023-09-02 15:23:45 +03:00
Jhen-Jie Hong
571083f508
server : avoid aniprompt in probabilities of final response ( #2849 )
b1150
2023-09-02 08:31:46 +08:00
Engininja2
f04d002844
cuda : vsubss4 for older versions of ROCm/clang ( #2942 )
b1149
2023-09-01 23:33:19 +02:00
ZHAOKAI WANG
69fdbb9abc
readme : quick start command fix ( #2908 )
...
* quick start command fix
* quick start win command fix
2023-09-01 17:06:44 +03:00
Kerfuffle
5d6f19f16b
Allow quantize to only copy tensors, some other improvements ( #2931 )
...
* Allow quantize tool to only copy tensors to allow repackaging models.
* Slightly better logic when requantizing.
* Change help message to go to `stdout`.
b1147
2023-09-01 08:02:48 -06:00
Georgi Gerganov
0d58936686
llama2c : rename function
b1146
2023-09-01 17:01:11 +03:00
Cebtenzzre
6c9c23429b
make : use unaligned vector moves on MinGW ( #2945 )
...
Fixes #2922
b1145
2023-09-01 16:53:14 +03:00
m3ndax
ee8654bcd0
minor : add const qualifiers ( #2853 )
...
* made the methods const
# Conflicts:
# examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp
* made method const
* Update convert-llama2c-to-ggml.cpp
removed write_raw and write_u32
* llama2c : remove misleading const
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b1144
2023-09-01 16:47:27 +03:00
Konstantin Herud
49bb9cbe0f
docs : add java-llama.cpp to README.md ( #2935 )
2023-09-01 16:36:14 +03:00
Cebtenzzre
ef15649972
build : fix most gcc and clang warnings ( #2861 )
...
* fix most gcc and clang warnings
* baby-llama : remove commented opt_params_adam
* fix some MinGW warnings
* fix more MinGW warnings
b1142
2023-09-01 16:34:50 +03:00
Ben Siraphob
d8d6977f48
examples : add C grammar ( #2357 )
2023-09-01 16:32:14 +03:00
Tameem
5aec2cfaac
ggml : add RISC-V vector intrinsics support ( #2929 )
...
* added support for RISCV CFLAGS & native compile + cross compile options
* Add RISC-V Vector Intrinsics Support
Added RVV intrinsics for following
ggml_vec_dot_q4_0_q8_0
ggml_vec_dot_q4_1_q8_1
ggml_vec_dot_q5_0_q8_0
ggml_vec_dot_q5_1_q8_1
ggml_vec_dot_q8_0_q8_0
Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai>
Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
---------
Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
Co-authored-by: moiz.hussain <moiz.hussain@10xengineers.ai>
Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai>
b1140
2023-09-01 16:27:40 +03:00
Georgi Gerganov
13268c5331
metal : slight speed-up for add and mul kernels ( #2917 )
2023-09-01 13:42:41 +03:00
staviq
4dcd47d71d
logs : fix mingw-like builds ( fixes #2898 ) ( #2911 )
...
* fix mingw-like builds
* formatting
* make LOG_COMPAT easier to override and extend
* simplify win detection
* fix for #2940
b1138
2023-09-01 12:07:06 +03:00
Cebtenzzre
18705a30ef
llama2c : fix segfault and alloc-dealloc-mismatch ( #2913 )
...
* llama2c : fix segfault if vocab is not found
* llama2c : fix mismatch between new[] and delete
* llama2c : fix basename on Windows
* llama2c : use a destructor to prevent memory leaks
b1137
2023-09-01 12:03:49 +03:00
Kawrakow
e8d9158925
metal: somewhat faster f16 x f32 matrix multiply kernel ( #2951 )
...
* Somewhat faster f16 x f32 matrix multiply kernel
* Better use 32 thread groups for f16 x f32
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-09-01 11:15:57 +03:00
Cebtenzzre
bce1fef328
convert : fix another python 3.8 issue ( #2949 )
2023-08-31 22:13:51 -04:00
slaren
528134dd02
remove convert-llama-7b-pth-to-gguf.py and convert-llama-hf-to-gguf.py ( #2906 )
2023-09-01 01:32:09 +02:00
Kerfuffle
aeefac4ff7
scripts: Use local gguf package when running from repo ( #2927 )
...
* scripts: Use local gguf when running from repo
2023-08-31 16:49:24 -06:00
DannyDaemonic
e8422de39e
@vxiiduu's fix for PrefetchVirtualMemory ( #2930 )
...
Reimplement fix for `PrefetchVirtualMemory`.
Co-authored-by: vxiiduu <73044267+vxiiduu@users.noreply.github.com>
b1132
2023-08-31 04:21:45 -07:00