Georgi Gerganov
587384e8f2
context : add get_ctx_padding()
...
ggml-ci
2025-01-20 09:30:24 +02:00
Georgi Gerganov
12719550a6
cont : move kv_self update to llama_context
...
ggml-ci
2025-01-20 09:30:24 +02:00
Georgi Gerganov
556155a525
llama : remove references to llama_kv_cache (wip)
...
Intermediate step necessary to abstract the `llama_context` and
`llama_kv_cache`.
ggml-ci
2025-01-20 09:30:24 +02:00
Georgi Gerganov
0c9f5aff83
llama : fix names [no ci]
2025-01-20 09:30:23 +02:00
Georgi Gerganov
40b521b656
context : minor
...
ggml-ci
2025-01-20 09:30:23 +02:00
Georgi Gerganov
e7f2dc8bc4
llama : update llama_kv_self API
...
ggml-ci
2025-01-20 09:30:23 +02:00
Georgi Gerganov
4ef4c96100
kv_cache : move state read/write to llama_kv_cache
...
ggml-ci
2025-01-20 09:30:23 +02:00
Georgi Gerganov
057e8f5c82
context : prepare kv_cache_read/write to be moved to kv_cache
...
ggml-ci
2025-01-20 09:30:23 +02:00
Georgi Gerganov
d8b9013108
kv_cache : minor
2025-01-20 09:30:22 +02:00
Georgi Gerganov
422ecaf52c
kv_cache : fix
...
ggml-ci
2025-01-20 09:30:22 +02:00
Georgi Gerganov
53a7c20d89
kv_cache : functions -> members
...
ggml-ci
2025-01-20 09:30:22 +02:00
Georgi Gerganov
d58c3cf137
llama : add struct llama_kv_cache (wip) [no ci]
2025-01-20 09:30:22 +02:00
Georgi Gerganov
ef6dada60c
cont : fix whitespaces ( #11305 )
2025-01-20 09:29:32 +02:00
Kyle Bruene
ae3c1db2f9
llama : re-add LLM_ARCH_PHIMOE ( #11305 )
...
Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor.
2025-01-20 09:21:01 +02:00
Georgi Gerganov
4dd34ff831
cmake : add sanitizer flags for llama.cpp ( #11279 )
...
* cmake : add sanitizer flags for llama.cpp
ggml-ci
* tests : fix compile warnings
ggml-ci
* cmake : move sanitizer flags to llama_add_compile_flags
ggml-ci
* cmake : move llama.cpp compile flags to top level lists
ggml-ci
* cmake : apply only sanitizer flags at top level
ggml-ci
* tests : fix gguf context use in same_tensor_data
* gguf-test: tensor data comparison
* dummy : trigger ggml-ci
* unicode : silence gcc warnings
ggml-ci
* ci : use sanitizer builds only in Debug mode
ggml-ci
* cmake : add status messages [no ci]
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-01-18 16:18:15 +02:00
Radoslav Gerganov
667d72846c
rpc : early register backend devices ( #11262 )
...
Early register RPC devices and do not propagate RPC specifics in the
llama model structures.
ref: #10609
2025-01-17 10:57:09 +02:00
Georgi Gerganov
a133566d34
vocab : fix double-eos check ( #11273 )
...
ggml-ci
2025-01-17 09:28:00 +02:00
Xuan Son Nguyen
681149ced2
llama : add llama_model_load_from_splits
( #11255 )
...
* llama : add `llama_model_load_from_splits`
* update
2025-01-16 13:54:08 +01:00
Johannes Gäßler
432df2d5f9
RoPE: fix back, CUDA support for back + noncont. ( #11240 )
...
* RoPE: fix back, CUDA support for back + noncont.
* fix comments reg. non-cont. RoPE support [no-ci]
2025-01-15 12:51:37 +01:00
Georgi Gerganov
bbf3e55e35
vocab : add dummy tokens for "no_vocab" type ( #11231 )
...
* vocab : add dummy tokens for "no_vocab" type
ggml-ci
* vocab : minor [no ci]
2025-01-14 11:54:58 +01:00
Daniel Bevenius
8f70fc3d1b
llama : remove 'd' from bad special token log ( #11212 )
...
This commit removes the 'd' from the log message in llama-vocab.cpp
when logging a bad special token.
The motivation for this is that currently the output can look something
like the following:
```console
load: bad special token:
'tokenizer.ggml.image_token_id' = 128256d, using default id -1
```
2025-01-13 13:38:20 +01:00
Xuan Son Nguyen
9a483999a6
llama : fix chat template gguf key ( #11201 )
2025-01-12 13:45:14 +01:00
Georgi Gerganov
08f10f69c3
llama : remove notion of CLS token ( #11064 )
...
ggml-ci
2025-01-12 12:15:53 +02:00
Georgi Gerganov
afa8a9ec9b
llama : add llama_vocab
, functions -> methods, naming ( #11110 )
...
* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-01-12 11:32:42 +02:00
Molly Sophia
ee7136c6d1
llama: add support for QRWKV6 model architecture ( #11001 )
...
llama: add support for QRWKV6 model architecture (#11001 )
* WIP: Add support for RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV: Some graph simplification
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Add support for RWKV6Qwen2 with cpu and cuda GLA
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix some typos
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* code format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix wkv test & add gla test
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix cuda warning
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update README.md
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update ggml/src/ggml-cuda/gla.cu
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Fix fused lerp weights loading with RWKV6
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* better sanity check skipping for QRWKV6 in llama-quant
thanks @compilade
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: compilade <git@compilade.net>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2025-01-10 09:58:08 +08:00
Pierrick Hymbert
f8feb4b01a
model: Add support for PhiMoE arch ( #11003 )
...
* model: support phimoe
* python linter
* doc: minor
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
* doc: minor
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
* doc: add phimoe as supported model
ggml-ci
---------
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
2025-01-09 11:21:41 +01:00
Xuan Son Nguyen
d9feae1c06
llama-chat : add phi 4 template ( #11148 )
2025-01-09 10:07:33 +01:00
Xuan Son Nguyen
4d2b3d8804
lora : improve compat with mergekit-extract-lora
( #11131 )
...
* (wip) support mergekit-extracted lora
* support mergekit-extract-lora
* use lora->get_scale
* correct comment
* correct norm name & condition
* add some hints
2025-01-08 15:59:53 +01:00
Georgi Gerganov
c07d437bbd
llama : avoid hardcoded QK_K ( #11061 )
...
ggml-ci
2025-01-08 16:19:36 +02:00
Johannes Gäßler
53ff6b9b9f
GGUF: C++ refactor, backend support, misc fixes ( #11030 )
...
* GGUF: C++ refactor, backend support, misc fixes
remove ggml_tensor.backend
update CODEOWNERS [no ci]
remove gguf_get_data from API
revise GGUF API data types
2025-01-07 18:01:58 +01:00
Georgi Gerganov
ecebbd292d
llama : remove unused headers ( #11109 )
...
ggml-ci
2025-01-06 17:52:35 +02:00
Xuan Son Nguyen
09186fabbe
llama : remove check flash_attn with lora ( #11104 )
2025-01-06 13:41:12 +01:00
Asghar Ghorbani
96a1dc27c3
llama : prevent system info string accumulation across calls ( #11101 )
2025-01-06 13:21:46 +02:00
Daniel Bevenius
6369f867a4
llama : rename missed batch params/vars to ubatch ( #10059 )
...
This commit renames the `batch` parameter to `ubatch` in the
`llama_kv_cache_find_slot`, `llm_build_inp_embd`, and
`llm_build_mamba` functions.
The motivation for this is that this should have been done as part of
Commit 19d900a756
("llama : rename batch
to ubatch (#9950 )") but for some reason I missed these functions in
that commit and only noticed them now (sorry).
2025-01-06 11:28:17 +02:00
Georgi Gerganov
47182dd03f
llama : update llama_model API names ( #11063 )
...
* llama : deprecate llama_free_model, add llama_model_free
ggml-ci
* llama : change `llama_load_model_from_file` -> `llama_model_load_from_file`
ggml-ci
2025-01-06 10:55:18 +02:00
Georgi Gerganov
ae2f606bb5
mmap : fix fileno macro clash ( #11076 )
...
* mmap : fix fileno macro clash
ggml-ci
* cont
ggml-ci
2025-01-06 10:52:38 +02:00
Georgi Gerganov
727368c60f
llama : use LLAMA_TOKEN_NULL ( #11062 )
...
ggml-ci
2025-01-06 10:52:15 +02:00
Georgi Gerganov
5047dd3546
llama : use _impl suffix instead of _internal ( #11060 )
...
ggml-ci
2025-01-06 10:52:01 +02:00
fairydreaming
9394bbd484
llama : Add support for DeepSeek V3 ( #11049 )
...
* convert : extend DEEPSEEK2 model architecture to support DeepseekV3ForCausalLM by adding EXPERT_WEIGHTS_NORM and EXPERT_GATING_FUNC model parameters and FFN_EXP_PROBS_B tensor type
* vocab : add DeepSeek V3 pre-tokenizer regexes
* unicode : handle ACCENT_MARK and SYMBOL categories in regex
* llama : add DeepSeek V3 chat template, handle new model parameters and tensor types
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2025-01-04 21:06:11 +01:00
DAN™
46be942214
llama : add support for the cohere2 model architecture ( #10900 )
2025-01-04 16:33:31 +02:00
Georgi Gerganov
f66f582927
llama : refactor src/llama.cpp
( #10902 )
...
* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci]
2025-01-03 10:18:53 +02:00
Georgi Gerganov
30caac3a68
llama : the WPM vocabs use the CLS token as BOS ( #10930 )
...
* llama : the WPM vocabs use the CLS token as BOS
ggml-ci
* llama : add comment
2024-12-24 09:44:20 +02:00
Yun Dou
b92a14a841
llama : support InfiniAI Megrez 3b ( #10893 )
...
* Support InfiniAI Megrez 3b
* Fix tokenizer_clean_spaces for megrez
2024-12-23 01:35:44 +01:00
ymcki
6f0c9e034b
llama : support for Llama-3_1-Nemotron-51B ( #10669 )
...
* conflict resolution
* move comments after bracket to its own line
2024-12-23 01:22:33 +01:00
Billel Mokeddem
7ae33a616f
llama : add Falcon3 support ( #10883 )
...
* Add Falcon3 model support
* Add fix for adding bos to added special tokens
* Add comment explaining the logic behind the if statement
* Add a log message to better track the when the following line of code is triggered
* Update log to only print when input and output characters are different
* Fix handling pre-normalized tokens
* Refactoring
2024-12-23 00:09:58 +02:00
Georgi Gerganov
5cab3e4aaa
llama : minor grammar refactor ( #10897 )
...
ggml-ci
2024-12-19 17:42:13 +02:00
Sukriti Sharma
2fffc52b50
llama : fix Roberta embeddings ( #10856 )
...
* fix: Use gpt2 tokenizer for roberta and add eos/bos tokens
Branch: RobertaTokenizer
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* fixes to position embeddings
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* map roberta-bpe to gpt-2
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix linting
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>
2024-12-19 15:04:51 +02:00
fairydreaming
7585edbdeb
convert : Add support for Microsoft Phi-4 model ( #10817 )
...
* convert : use GPT2 vocab for Phi-4 model
* convert : use null value of sliding_window to distinguish Phi-4 from other PHI3-based models
* llama : do not use sliding window attention mask for Phi-4 model
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2024-12-19 10:37:12 +01:00
Georgi Gerganov
0bf2d10c55
tts : add OuteTTS support ( #10784 )
...
* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : be explicit about the pooling type in the tests
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* llama : add OuteTTS support (wip)
* wip
* extract features
* first conv
* group norm
* resnet conv
* resnet
* attn
* pos net
* layer norm
* convnext
* head
* hann window
* fix n_embd + remove llama.cpp hacks
* compute hann window
* fft
* spectrum processing
* clean-up
* tts : receive input text and generate codes
* clip : fix new conv name
* tts : minor fix
* tts : add header + minor fixes
ggml-ci
* tts : add matchematical constant
ggml-ci
* tts : fix sampling + cut initial noise
* tts : fixes
* tts : update default samplers
ggml-ci
* tts : text pre-processing
* tts : outetts-voc -> wavtokenizer-dec
* tts : remove hardcoded constants
ggml-ci
* tts : fix tensor shapes
* llama : refactor wavtokenizer tensors
ggml-ci
* cont
ggml-ci
* cont [no ci]
* llama : update WavTokenizer to non-causal attn
* llama : handle no-vocab detokenization
* tts : add Python example for OuteTTS (wip)
* tts : extend python example to generate spectrogram
ggml-ci
* server : fix rebase artifacts
* tts : enable "return_tokens" in Python example
ggml-ci
* tts : minor fixes
* common : support HF download for vocoder
2024-12-18 19:27:21 +02:00
Diego Devesa
4da69d1abd
Revert "llama : add Falcon3 support ( #10864 )" ( #10876 )
...
This reverts commit 382bc7f2e8
.
2024-12-18 01:36:46 +01:00