klosax
8ad7cd49fb
Update convert-llama-h5-to-gguf.py
2023-07-29 16:47:00 +02:00
M. Yusuf Sarıgöz
0317c41d98
gguf : upd gguf conversion script
2023-07-29 13:31:07 +03:00
M. Yusuf Sarıgöz
cc3dd7f042
gguf : write tokenizer data
2023-07-29 13:30:22 +03:00
M. Yusuf Sarıgöz
8a76dd8a85
gguf : write tensors one by one
2023-07-29 13:17:28 +03:00
M. Yusuf Sarıgöz
c861e234f4
gguf : write tensors one by one
2023-07-29 12:49:01 +03:00
M. Yusuf Sarıgöz
0c219fb5b5
gguf : fix writing gguf arrays
2023-07-29 12:42:54 +03:00
M. Yusuf Sarıgöz
93f7f7aef7
gguf : write tensors one by one and code reuse
2023-07-29 12:34:35 +03:00
M. Yusuf Sarıgöz
aa99562d70
Merge branch 'gguf' of https://github.com//ggerganov/llama.cpp into gguf
2023-07-29 12:26:11 +03:00
M. Yusuf Sarıgöz
ea5f9ad2ca
gguf : fix writing gguf arrays
2023-07-29 12:25:43 +03:00
klosax
999431c4b6
quick and dirty conversion example
2023-07-29 11:20:05 +02:00
M. Yusuf Sarıgöz
d54f53ca51
gguf : add tokenization constants
2023-07-29 12:04:45 +03:00
M. Yusuf Sarıgöz
06f423a8e1
gguf : write sample tensors to read
2023-07-29 10:26:26 +03:00
M. Yusuf Sarıgöz
08dc8fd884
gguf : do not hardcode tensor names to read
2023-07-29 10:24:46 +03:00
M. Yusuf Sarıgöz
9475cdb7a3
Merge branch 'gguf-write-tokenization' into gguf
2023-07-29 00:36:35 +03:00
M. Yusuf Sarıgöz
1495735aac
gguf : fix writing tensors
2023-07-29 00:26:22 +03:00
klosax
3492f848d7
gguf : add gguf_find_key ( #2438 )
...
* gguf.cpp : find key example
* ggml.h : add gguf_find_key
* ggml.c : add gguf_find_key
2023-07-28 23:45:24 +03:00
M. Yusuf Sarıgöz
11ef380c2a
GGUF : write tensor ( #2426 )
...
* WIP: Write tensor
* GGUF : Support writing tensors in Python
* refactor : rm unused import and upd todos
* fix : fix errors upd writing example
* rm example.gguf
* gitignore *.gguf
* undo formatting
2023-07-28 11:34:16 +03:00
Georgi Gerganov
d2bb3ac10b
convert.py : remove GGML vocab + other obsolete stuff
2023-07-27 16:36:35 +03:00
Georgi Gerganov
68f53485e4
convert.py : start a new simplified implementation by removing old stuff
2023-07-27 15:56:53 +03:00
Georgi Gerganov
158be8f7f4
gguf.py : some code style changes
2023-07-27 15:37:06 +03:00
Georgi Gerganov
d2b6ca13ad
gguf : add array support
2023-07-27 14:53:07 +03:00
Georgi Gerganov
d89533dff6
gguf : expose the gguf_type enum through the API for now
2023-07-27 11:10:34 +03:00
M. Yusuf Sarıgöz
c85d3178b3
refactor : reduce code duplication and better API ( #2415 )
2023-07-27 10:29:29 +03:00
Georgi Gerganov
d8491fc7e3
gguf : add comments
2023-07-26 23:00:24 +03:00
Georgi Gerganov
5628ec7163
gguf : read / write sample models
2023-07-26 22:40:45 +03:00
Georgi Gerganov
e46870f5af
gguf : gguf.c is now part of ggml.c
2023-07-26 18:55:32 +03:00
Georgi Gerganov
d313c0fa33
gguf : simplify gguf_get_val
2023-07-26 18:53:57 +03:00
Georgi Gerganov
cb871fa022
gguf : do not support passing existing ggml_context to gguf_init
2023-07-26 18:48:52 +03:00
Georgi Gerganov
860c9c63ce
gguf : add gguf_get_tensor_name()
2023-07-26 18:21:14 +03:00
Georgi Gerganov
78b226a959
gguf : initial model loading - not tested
2023-07-26 18:21:14 +03:00
Georgi Gerganov
d91b985d2d
gguf : read tensor info
2023-07-26 18:21:13 +03:00
Georgi Gerganov
8d6acfec12
gguf : read header + meta data
2023-07-26 18:21:13 +03:00
Georgi Gerganov
6873148771
gguf : first API pass
2023-07-26 18:21:13 +03:00
Georgi Gerganov
7e82d25f40
ci : disable CI temporary to not waste energy
2023-07-26 18:21:13 +03:00
M. Yusuf Sarıgöz
bae6b125f6
wip : implement GGUF ( #2397 )
...
* Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384 )
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
* WIP: python class to write GGUF, incomplete C apı for reading
---------
Co-authored-by: Kawrakow <48489457+ikawrakow@users.noreply.github.com>
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-26 18:21:13 +03:00
Georgi Gerganov
4d698495ea
gguf : init
2023-07-26 18:21:12 +03:00
slaren
5488fb789e
ggml : allocate graphs in a context ( #2392 )
...
* ggml : graph allocation in contexts
* allocate work buffer as a ggml_object in ggml_graph_compute_with_ctx
* llama.cpp : allocate graph in the context
* add GGML_PAD
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
master-5488fb7
2023-07-26 15:56:53 +02:00
Kawrakow
eb542d3932
Add LLAMA_DEFAULT_RMS_EPS so we can change the default ( #2384 )
...
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
master-eb542d3
2023-07-25 18:35:53 +03:00
slaren
07aaa0f63f
ggml : fix ggml_flash_attn to use op_params ( #2387 )
...
* ggml : fix ggml_flash_attn to use op_params
master-07aaa0f
2023-07-25 16:20:12 +02:00
ldwang
fce48caf9a
convert.py : support bpe tokenizer ( #2228 )
...
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert, fix
Signed-off-by: ldwang <ftgreat@gmail.com>
---------
Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
2023-07-25 16:22:09 +03:00
Jiahao Li
875086bdb9
ggml : relax contiguous constraints in activation function ( #2371 )
master-875086b
2023-07-25 15:58:32 +03:00
slaren
da1889834a
ggml : improve graph build time via hash table lookup ( #2329 )
...
* improve graph build time
* ggml_tensor : use 1 bit per flag
* use a hash table instead
master-da18898
2023-07-25 15:32:20 +03:00
Hesen Peng
82552b7f54
build : fix line breaking error in build-info.sh ( #2349 )
...
* fix line breaking
* build number line break removal
2023-07-25 15:24:09 +03:00
Xiao-Yong Jin
0c06204fb3
main : add --in-prefix-bos
to prefix BOS to user inputs; keep EOS ( #2304 )
...
* add `--in-prefix-bos` to prefix BOS to user inputs; keep EOS
The BOS precedes the string specified by `--in-prefix`.
Model generated EOS is now kept in the context.
It provides a way to strictly following the prompt format used in
Llama-2-chat.
The EOS handling also benefits some existing finetunes that uses
EOS to mark the end of turn.
* examples/common: move input_prefix_bos to other bools
master-0c06204
2023-07-25 15:19:11 +03:00
Eve
1fed755b1f
ci : add non-AVX scalar build/test ( #2356 )
...
* noavx build and test
* we don't need to remove f16c in windows
master-1fed755
2023-07-25 15:16:13 +03:00
katsu560
be2301bcda
k_quants : add AVX support to dot functions with QK_K as 64 ( #2339 )
...
* add AVX to ggml_vec_dot_q2_K_q8_K()
* add AVX to ggml_vec_dot_q3_K_q8_K()
* add AVX to ggml_vec_dot_q4_K_q8_K()
* add AVX to ggml_vec_dot_q5_K_q8_K()
* add AVX to ggml_vec_dot_q6_K_q8_K()
* refactor AVX code in ggml_vec_dot_q6_K_q8_K()
master-be2301b
2023-07-25 15:13:41 +03:00
Shouzheng Liu
1aa18ef994
metal : concurrently dispatch commands ( #2358 )
...
* metal: concurrently dispatch commands
Function `ggml_metal_graph_find_concurrency` will run and write
commands that can be issued concurrently to metal context `concur_list`
array, when `ggml_metal_graph_compute` is called for the first time.
* metal: don't call find_concurrency automatically.
* metal : code style changes
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
master-1aa18ef
2023-07-25 15:00:19 +03:00
Kawrakow
9a08eaf3c4
Another speed gain for Q4_0 and Q4_1 on Metal ( #2375 )
...
* Another speed gain for Q4_0 and Q4_1 on Metal
* Have N_DST, etc., be template parameters
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-25 13:48:29 +03:00
Kawrakow
129d844c87
Fix Q4_K and Q5_K for QK_K = 64 on CUDA ( #2359 )
...
* Fix Q4_K and Q5_K for QK_K = 64
* Very slightly better Q5_K bit fiddling
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
master-129d844
2023-07-25 13:48:04 +03:00
slaren
d5512b782b
server: add rms_norm_eps parameter ( #2380 )
master-d5512b7
2023-07-25 12:36:17 +03:00