Iwan Kawrakow
6af0bab347
~4-5% improvement for Q8_0 TG on metal
2023-09-03 09:00:27 +03:00
Iwan Kawrakow
363f0bf558
Massive improvement for TG for fp16
2023-09-02 18:15:43 +03:00
Georgi Gerganov
01eed465c4
Merge branch 'master' into ik/more_metal_optimizations
2023-09-02 11:22:21 +03:00
Jhen-Jie Hong
571083f508
server : avoid aniprompt in probabilities of final response ( #2849 )
2023-09-02 08:31:46 +08:00
Engininja2
f04d002844
cuda : vsubss4 for older versions of ROCm/clang ( #2942 )
2023-09-01 23:33:19 +02:00
Iwan Kawrakow
74df0de9e6
Minor
2023-09-01 18:15:45 +03:00
Iwan Kawrakow
b557bc326d
Another attempt
2023-09-01 17:50:31 +03:00
Iwan Kawrakow
2b601702a8
Quite significant PP speedup on metal
2023-09-01 17:50:31 +03:00
Iwan Kawrakow
e3ff8c20c8
Another very minor speedup on metal
2023-09-01 17:50:31 +03:00
Iwan Kawrakow
2cb47e0e16
Very minor speedup via simd-group synchronization in f16 x f32
2023-09-01 17:50:31 +03:00
ZHAOKAI WANG
69fdbb9abc
readme : quick start command fix ( #2908 )
...
* quick start command fix
* quick start win command fix
2023-09-01 17:06:44 +03:00
Kerfuffle
5d6f19f16b
Allow quantize to only copy tensors, some other improvements ( #2931 )
...
* Allow quantize tool to only copy tensors to allow repackaging models.
* Slightly better logic when requantizing.
* Change help message to go to `stdout`.
2023-09-01 08:02:48 -06:00
Georgi Gerganov
0d58936686
llama2c : rename function
2023-09-01 17:01:11 +03:00
Cebtenzzre
6c9c23429b
make : use unaligned vector moves on MinGW ( #2945 )
...
Fixes #2922
2023-09-01 16:53:14 +03:00
m3ndax
ee8654bcd0
minor : add const qualifiers ( #2853 )
...
* made the methods const
# Conflicts:
# examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp
* made method const
* Update convert-llama2c-to-ggml.cpp
removed write_raw and write_u32
* llama2c : remove misleading const
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-01 16:47:27 +03:00
Konstantin Herud
49bb9cbe0f
docs : add java-llama.cpp to README.md ( #2935 )
2023-09-01 16:36:14 +03:00
Cebtenzzre
ef15649972
build : fix most gcc and clang warnings ( #2861 )
...
* fix most gcc and clang warnings
* baby-llama : remove commented opt_params_adam
* fix some MinGW warnings
* fix more MinGW warnings
2023-09-01 16:34:50 +03:00
Ben Siraphob
d8d6977f48
examples : add C grammar ( #2357 )
2023-09-01 16:32:14 +03:00
Tameem
5aec2cfaac
ggml : add RISC-V vector intrinsics support ( #2929 )
...
* added support for RISCV CFLAGS & native compile + cross compile options
* Add RISC-V Vector Intrinsics Support
Added RVV intrinsics for following
ggml_vec_dot_q4_0_q8_0
ggml_vec_dot_q4_1_q8_1
ggml_vec_dot_q5_0_q8_0
ggml_vec_dot_q5_1_q8_1
ggml_vec_dot_q8_0_q8_0
Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai>
Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
---------
Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
Co-authored-by: moiz.hussain <moiz.hussain@10xengineers.ai>
Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai>
2023-09-01 16:27:40 +03:00
Georgi Gerganov
13268c5331
metal : slight speed-up for add and mul kernels ( #2917 )
2023-09-01 13:42:41 +03:00
staviq
4dcd47d71d
logs : fix mingw-like builds ( fixes #2898 ) ( #2911 )
...
* fix mingw-like builds
* formatting
* make LOG_COMPAT easier to override and extend
* simplify win detection
* fix for #2940
2023-09-01 12:07:06 +03:00
Cebtenzzre
18705a30ef
llama2c : fix segfault and alloc-dealloc-mismatch ( #2913 )
...
* llama2c : fix segfault if vocab is not found
* llama2c : fix mismatch between new[] and delete
* llama2c : fix basename on Windows
* llama2c : use a destructor to prevent memory leaks
2023-09-01 12:03:49 +03:00
Kawrakow
e8d9158925
metal: somewhat faster f16 x f32 matrix multiply kernel ( #2951 )
...
* Somewhat faster f16 x f32 matrix multiply kernel
* Better use 32 thread groups for f16 x f32
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-09-01 11:15:57 +03:00
Cebtenzzre
bce1fef328
convert : fix another python 3.8 issue ( #2949 )
2023-08-31 22:13:51 -04:00
slaren
528134dd02
remove convert-llama-7b-pth-to-gguf.py and convert-llama-hf-to-gguf.py ( #2906 )
2023-09-01 01:32:09 +02:00
Kerfuffle
aeefac4ff7
scripts: Use local gguf package when running from repo ( #2927 )
...
* scripts: Use local gguf when running from repo
2023-08-31 16:49:24 -06:00
DannyDaemonic
e8422de39e
@vxiiduu's fix for PrefetchVirtualMemory ( #2930 )
...
Reimplement fix for `PrefetchVirtualMemory`.
Co-authored-by: vxiiduu <73044267+vxiiduu@users.noreply.github.com>
2023-08-31 04:21:45 -07:00
Cebtenzzre
92d0b751a7
convert : fix python 3.8 support, modernize type annotations ( #2916 )
...
* convert : fix python 3.8 support
* convert : sort imports
* convert : fix required parameters in convert-llama-ggmlv3-to-gguf
* convert : fix mypy errors in convert-llama-ggmlv3-to-gguf
* convert : use PEP 585 generics and PEP 604 unions
Now that we have `from __future__ import annotations`, we can use this
modern syntax in Python 3.7 instead of restricting support to Python 3.9
or 3.10 respectively.
* gguf.py : a tuple is already a tuple
* add mypy.ini
* convert : add necessary `type: ignore` comments
* gguf-py: bump version
2023-08-31 08:02:23 +03:00
Johannes Gäßler
8afe228000
CUDA: mul_mat_q=true llama_context_params default ( #2912 )
2023-08-30 21:46:19 +02:00
Henri Vasserman
71d6975559
[Docker] fix tools.sh argument passing. ( #2884 )
...
* [Docker] fix tools.sh argument passing.
This should allow passing multiple arguments to containers with
the full image that are using the tools.sh frontend.
Fix from https://github.com/ggerganov/llama.cpp/issues/2535#issuecomment-1697091734
2023-08-30 19:14:53 +03:00
Georgi Gerganov
b532a69b2f
convert.py : use dir name to name the llama
2023-08-30 13:29:40 +03:00
Georgi Gerganov
c90d135eb4
examples : fix underscore in beam-search + .gitignore ( close #2900 )
2023-08-30 12:53:24 +03:00
M. Yusuf Sarıgöz
0d1c706181
gguf : add workflow for Pypi publishing ( #2896 )
...
* gguf : add workflow for Pypi publishing
* gguf : add workflow for Pypi publishing
* fix trailing whitespace
2023-08-30 12:47:40 +03:00
alonfaraj
9509294420
make : add test and update CI ( #2897 )
...
* build ci: run make test
* makefile:
- add all
- add test
* enable tests/test-tokenizer-0-llama
* fix path to model
* remove gcc-8 from macos build test
* Update Makefile
* Update Makefile
2023-08-30 12:42:51 +03:00
Gilad S
35092fb547
docs : add node-llama-cpp
to README.md
( #2885 )
2023-08-30 11:40:12 +03:00
Kerfuffle
dc07dc492e
convert : various script cleanups/fixes + merges and special token handling ( #2842 )
...
* convert: Fix permute calls and method/func definitions
* Cleanups for gguf-py
* Minor types cleanups.
* Initial implementation of handling merges and special tokens
* convert: Handle special tokens and merges in vocab only mode
convert: Vocab only mode no longer requires loading model tensors
* gguf: Refactor tensor name mapping
* convert: Fix type hint for special_token_types in SpecialVocab
* Use common special vocab handling in various conversion scripts
* First pass at implementing suggested changes
* Second pass
* gguf: SpecialVocab: Fix issue with special token content not in a dict
gguf: SpecialVocab: Allow skipping handling of merges
* convert-falcon-hf-to-gguf: Support --vocab-only option, bail out if no tokenizer.json
* convert-gptneox-hf-to-gguf and convert: Only handle merges for BPE tokenizer
* gguf: SpecialVocab: Actually set load_merges in object
* Uniform args parsing and vocab only mode for convert examples
* convert.py: Set gpt2 as tokenizer model when using BPE
* Squish last type warning in gguf.py - yay!
2023-08-30 11:25:50 +03:00
chaihahaha
ad9ddcff6e
llm.vim : stop generation at multiple linebreaks, bind to <F2> ( #2879 )
2023-08-30 09:50:55 +03:00
staviq
8341a25957
main : log file ( #2748 )
...
* initial, base LOG macro
* add *.log to .gitignore
* added basic log file handler
* reverted log auto endline to better mimic printf
* remove atomics and add dynamic log target
* log_enable/disable, LOG_TEE, basic usage doc
* update .gitignore
* mv include to common, params, help msg
* log tostring helpers, token vectors pretty prints
* main: replaced fprintf/LOG_TEE, some trace logging
* LOG_DISABLE_LOGS compile flag, wrapped f in macros
* fix LOG_TEELN and configchecker
* stub LOG_DUMP_CMDLINE for WIN32 for now
* fix msvc
* cleanup main.cpp:273
* fix stray whitespace after master sync
* log : fix compile warnings
- do not use C++20 stuff
- use PRIu64 to print uint64_t
- avoid string copies by using const ref
- fix ", ##__VA_ARGS__" warnings
- compare strings with == and !=
* log : do not append to existing log + disable file line func by default
* log : try to fix Windows build
* main : wip logs
* main : add trace log
* review: macro f lowercase, str append to sstream
* review: simplify ifs and str comparisons
* fix MSVC, formatting, FMT/VAL placeholders
* review: if/else cleanup
* review: if/else cleanup (2)
* replace _ prefix with _impl suffix
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-08-30 09:29:32 +03:00
Cebtenzzre
849408957c
tests : add a C compliance test ( #2848 )
...
* tests : add a C compliance test
* make : build C compliance test by default
* make : fix clean and make sure C test fails on clang
* make : move -Werror=implicit-int to CFLAGS
2023-08-30 09:20:26 +03:00
slaren
06abf8eeba
ggml : add view_src and view_offs to ggml_tensor for views ( #2874 )
...
* ggml : add view_src and view_offs
* update ggml-alloc to use view_src
* update ggml_diag_mask to work correctly with automatic inplace
* exclude other ops that set an inplace flag from automatic inplace
2023-08-29 23:24:42 +02:00
slaren
c03a243abf
remove outdated references to -eps and -gqa from README ( #2881 )
2023-08-29 23:17:34 +02:00
Kawrakow
fa3582f509
Tell users attmepting to run perplexity with too few tokens to use more ( #2882 )
...
Closes #2858
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-08-29 23:55:45 +03:00
Kawrakow
e37e69dcc3
10X faster BPE tokenizer ( #2876 )
...
* 10X faster BPE tokenizer
* Remove comment that no longer applies
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-08-29 23:55:03 +03:00
maddes8cht
53885d7256
py : fix "usage" messages ( #2873 )
...
convert-to-gguf python scripts
2023-08-29 16:51:02 +03:00
jameswu2014
bcce96ba4d
convert.py : fix baichuan7B support ( #2870 )
...
* [Fix]: convert.py support baichuan7B
* convert.py : fix trailing whitespaces
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-08-29 12:48:41 +03:00
Jhen-Jie Hong
74e0caeb82
readme : add react-native binding ( #2869 )
2023-08-29 12:30:10 +03:00
Cebtenzzre
d4b5e16c32
make : fix clang tests build, add missing examples ( #2859 )
...
* make : do not pass headers to the compiler
This fixes building tests with clang.
* make : add missing examples
* make : fix build-info.h dependencies
2023-08-29 11:42:41 +03:00
Georgi Gerganov
3a007648f2
metal : add option to disable debug logs ( close #2764 )
2023-08-29 11:33:46 +03:00
Georgi Gerganov
611363ac79
scripts : add pipefail
2023-08-29 10:50:30 +03:00
Marcus Dunn
95b6e5212f
added struct
to llama_dump_timing_info_yaml's llama_context
( #2857 )
...
fixes C compat.
2023-08-29 09:33:27 +03:00