ochafik
e63520f37a
Forward decl minja::chat_template to avoid eager json dep
2025-01-18 10:37:56 +00:00
ochafik
d5fa351a24
Revert LLAMA_CHATML_TEMPLATE refactor
2025-01-18 01:04:12 +00:00
ochafik
81c0d437a5
Attempt to fix linkage of LLAMA_CHATML_TEMPLATE
2025-01-18 00:56:19 +00:00
ochafik
40db78963b
Merge remote-tracking branch 'origin/master' into jinja
2025-01-18 00:44:37 +00:00
ochafik
b75d0622e4
Refactor common_chat_* functions to accept minja template + use_jinja option
2025-01-18 00:43:38 +00:00
Radoslav Gerganov
667d72846c
rpc : early register backend devices ( #11262 )
...
Early register RPC devices and do not propagate RPC specifics in the
llama model structures.
ref: #10609
2025-01-17 10:57:09 +02:00
Olivier Chafik
1b3bb7eeb9
Update arg.cpp
2025-01-14 00:07:18 +00:00
ochafik
a6afb2735f
Update common_chat_format_example to use minja template wrapper
2025-01-13 22:57:35 +00:00
ochafik
c04c50e40c
Merge remote-tracking branch 'origin/master' into jinja
2025-01-13 22:26:13 +00:00
ochafik
8dd4f334a4
Add --jinja to llama-run
2025-01-13 22:07:49 +00:00
ochafik
18f257bf1a
Fix deprecation
2025-01-13 21:30:48 +00:00
ochafik
78861a3eb2
Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template
2025-01-13 19:58:15 +00:00
ochafik
cb72cf1fc3
Merge remote-tracking branch 'origin/master' into jinja
2025-01-13 19:56:27 +00:00
Xuan Son Nguyen
84a44815f7
cli : auto activate conversation mode if chat template is available ( #11214 )
...
* cli : auto activate conversation mode if chat template is detected
* add warn on bad template
* update readme (writing with the help of chatgpt)
* update readme (2)
* do not activate -cnv for non-instruct models
2025-01-13 20:18:12 +01:00
Xuan Son Nguyen
00b4c3da62
common : support tag-based --hf-repo like on ollama ( #11195 )
...
* common : support tag-based hf_repo like on ollama
* fix build
* various fixes
* small fixes
* fix style
* fix windows build?
* move common_get_hf_file to common.cpp
* fix complain with noreturn
2025-01-13 13:56:23 +01:00
Xuan Son Nguyen
9a483999a6
llama : fix chat template gguf key ( #11201 )
2025-01-12 13:45:14 +01:00
Georgi Gerganov
afa8a9ec9b
llama : add llama_vocab
, functions -> methods, naming ( #11110 )
...
* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-01-12 11:32:42 +02:00
Georgi Gerganov
a3c1232c3f
arg : option to exclude arguments from specific examples ( #11136 )
...
* arg : option to exclude arguments from specific examples
ggml-ci
* readme : remove old args [no ci]
2025-01-08 12:55:36 +02:00
Johannes Gäßler
53ff6b9b9f
GGUF: C++ refactor, backend support, misc fixes ( #11030 )
...
* GGUF: C++ refactor, backend support, misc fixes
remove ggml_tensor.backend
update CODEOWNERS [no ci]
remove gguf_get_data from API
revise GGUF API data types
2025-01-07 18:01:58 +01:00
Georgi Gerganov
47182dd03f
llama : update llama_model API names ( #11063 )
...
* llama : deprecate llama_free_model, add llama_model_free
ggml-ci
* llama : change `llama_load_model_from_file` -> `llama_model_load_from_file`
ggml-ci
2025-01-06 10:55:18 +02:00
Georgi Gerganov
727368c60f
llama : use LLAMA_TOKEN_NULL ( #11062 )
...
ggml-ci
2025-01-06 10:52:15 +02:00
Molly Sophia
4b0c638b9a
common : disable KV cache shifting automatically for unsupported models ( #11053 )
...
* Disable KV cache shifting automatically for unsupported models
instead of exiting directly
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-03 14:13:18 +02:00
Georgi Gerganov
f66f582927
llama : refactor src/llama.cpp
( #10902 )
...
* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci]
2025-01-03 10:18:53 +02:00
Xuan Son Nguyen
45095a61bf
server : clean up built-in template detection ( #11026 )
...
* server : clean up built-in template detection
* fix compilation
* add chat template test
* fix condition
2024-12-31 15:22:01 +01:00
Peter
6e1531aca5
common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON ( #11013 )
...
In common/common.cpp:
* Convert usage of stat() function call to check if file exists to standard library function std::filesystem::exists (error unable to match to correct function signature)
* Additional conditions to check if PATH_MAX is already defined in WIN32 environment (warning it is already defined in MSYS2)
In examples/run/run.cpp:
* Add io.h header inclusion (error cannot find function _get_osfhandle)
* Change initialisers for OVERLAPPED to empty struct (warning about uninitialised members)
* Add initialiser for hFile (warning it may be uninitialised)
* Add cast for curl_off_t percentage value to long int in generate_progress_prefix function (warning that curl_off_t is long long int)
In ggml/src/ggml-opencl/ggml-opencl.cpp:
* Initialise certain declared cl_mem variables to nullptr for greater safety (warning about B_d variable possibly used unassigned)
2024-12-31 01:46:06 +01:00
ochafik
389d79b6b4
Try and work around msvc++ non-macro max resolution quirk
2024-12-30 04:50:20 +00:00
ochafik
ce48584f7d
No designated initializers yet
2024-12-30 04:19:33 +00:00
ochafik
80138d9007
Add missing <optional> include
2024-12-30 04:10:20 +00:00
ochafik
e5113e8d74
Add --jinja and --chat-template-file flags
2024-12-30 03:50:51 +00:00
ochafik
abd274a48f
Copy minja from 58f0ca6dd7
2024-12-30 03:50:51 +00:00
Molly Sophia
0a11f8b7b5
convert : fix RWKV v6 model conversion ( #10913 )
...
* Enable --no-context-shift for llama-perplexity example
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV 6: Fix error in ggml_cuda_op_bin_bcast
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-12-20 11:44:58 +02:00
Georgi Gerganov
36319dec5d
tts : small QoL for easy model fetch ( #10903 )
2024-12-19 17:35:15 +02:00
Georgi Gerganov
0bf2d10c55
tts : add OuteTTS support ( #10784 )
...
* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : be explicit about the pooling type in the tests
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* llama : add OuteTTS support (wip)
* wip
* extract features
* first conv
* group norm
* resnet conv
* resnet
* attn
* pos net
* layer norm
* convnext
* head
* hann window
* fix n_embd + remove llama.cpp hacks
* compute hann window
* fft
* spectrum processing
* clean-up
* tts : receive input text and generate codes
* clip : fix new conv name
* tts : minor fix
* tts : add header + minor fixes
ggml-ci
* tts : add matchematical constant
ggml-ci
* tts : fix sampling + cut initial noise
* tts : fixes
* tts : update default samplers
ggml-ci
* tts : text pre-processing
* tts : outetts-voc -> wavtokenizer-dec
* tts : remove hardcoded constants
ggml-ci
* tts : fix tensor shapes
* llama : refactor wavtokenizer tensors
ggml-ci
* cont
ggml-ci
* cont [no ci]
* llama : update WavTokenizer to non-causal attn
* llama : handle no-vocab detokenization
* tts : add Python example for OuteTTS (wip)
* tts : extend python example to generate spectrogram
ggml-ci
* server : fix rebase artifacts
* tts : enable "return_tokens" in Python example
ggml-ci
* tts : minor fixes
* common : support HF download for vocoder
2024-12-18 19:27:21 +02:00
Georgi Gerganov
152610eda9
server : output embeddings for all tokens when pooling = none ( #10861 )
...
* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : update readme [no ci]
* server : fix spacing [no ci]
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* server : be explicit about the pooling type in the tests
ggml-ci
* server : update /embeddings and /v1/embeddings endpoints
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* server : update readme
ggml-ci
* server : fixes
* tests : update server tests
ggml-ci
* server : update readme [no ci]
* server : remove rebase artifact
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-12-18 13:01:41 +02:00
Georgi Gerganov
644fd71b44
sampling : refactor + optimize penalties sampler ( #10803 )
...
* sampling : refactor + optimize penalties sampler
ggml-ci
* common : apply ignore_eos as logit bias
ggml-ci
* batched : remove penalties sampler
* params : allow penalty_last_n == -1 to be equal to context size
ggml-ci
* common : by default, move the penalties at the end of the sampling chain
ggml-ci
* common : ignore all EOG tokens
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* common : move back the penalties at the front of the sampling chain
ggml-ci
* readme : restore hint about --ignore-eos flag [no ci]
* llama : minor
ggml-ci
* webui : update
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2024-12-16 12:31:14 +02:00
Eric Curtin
c27ac678dd
Opt class for positional argument handling ( #10508 )
...
Added support for positional arguments `model` and `prompt`. Added
functionality to download via strings like:
llama-run llama3
llama-run ollama://granite-code
llama-run ollama://granite-code:8b
llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
llama-run https://example.com/some-file1.gguf
llama-run some-file2.gguf
llama-run file://some-file3.gguf
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2024-12-13 19:34:25 +01:00
Xuan Son Nguyen
adffa6ffd5
common : improve -ctv -ctk CLI arguments ( #10806 )
...
* common : improve ctv ctk cli argument
* regenerate docs
* even better approach
* use std::vector
2024-12-12 22:53:05 +01:00
Xuan Son Nguyen
9fdb124304
common : add missing env var for speculative ( #10801 )
2024-12-12 16:57:32 +01:00
Bartowski
ae4b922614
imatrix : Add imatrix to --no-context-shift ( #10766 )
...
This allows for setting the --no-context-shift value in llama-imatrix which is required for models like DeepSeek
2024-12-10 18:23:50 +01:00
Yüg
a86ad841f1
server : add flag to disable the web-ui ( #10762 ) ( #10751 )
...
Co-authored-by: eugenio.segala <esegala@deloitte.co.uk>
2024-12-10 18:22:34 +01:00
Georgi Gerganov
c2a16c0bdb
server : fix free of spec context and batch ( #10651 )
...
ggml-ci
2024-12-07 11:52:44 +02:00
Xuan Son Nguyen
f162d45a21
common : bring back --no-warmup to server ( #10686 )
2024-12-06 13:29:05 +01:00
Xuan Son Nguyen
6c5bc0625f
server : (refactoring) do not rely on JSON internally ( #10643 )
...
* server : (refactoring) reduce usage of json internally
* move all response types to struct
* wip [no ci]
* many fixes
* add virtual function
* fix index
* minor style fix
* add std::move
* refactor handle_completions_generic
* add virtual functions
* remove server.hpp
* clarify server_sent_event RFC specs
* apply review comments
* fix model_alias and completion_probabilities
* small clean up
* remove virtual for to_json_oai_compat()
* naming oai_compat --> oaicompat
* fix unwanted recursive call
* update docs
2024-12-06 11:14:32 +01:00
Xuan Son Nguyen
642330ac7c
llama : add enum for built-in chat templates ( #10623 )
...
* llama : add enum for supported chat templates
* use "built-in" instead of "supported"
* arg: print list of built-in templates
* fix test
* update server README
2024-12-02 22:10:19 +01:00
haopeng
64ed2091b2
server: Add "tokens per second" information in the backend ( #10548 )
...
* add cmake rvv support
* add timings
* remove space
* update readme
* fix
* fix code
* remove empty line
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-12-02 14:45:54 +01:00
Diego Devesa
7cc2d2c889
ggml : move AMX to the CPU backend ( #10570 )
...
* ggml : move AMX to the CPU backend
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-29 21:54:58 +01:00
Johannes Gäßler
890719311b
common: fix warning message when no GPU found ( #10564 )
2024-11-28 18:15:25 +01:00
Xuan Son Nguyen
9f912511bc
common : fix duplicated file name with hf_repo and hf_file ( #10550 )
2024-11-27 22:30:52 +01:00
Georgi Gerganov
ab96610b1e
cmake : enable warnings in llama ( #10474 )
...
* cmake : enable warnings in llama
ggml-ci
* cmake : add llama_get_flags and respect LLAMA_FATAL_WARNINGS
* cmake : get_flags -> ggml_get_flags
* speculative-simple : fix warnings
* cmake : reuse ggml_get_flags
ggml-ci
* speculative-simple : fix compile warning
ggml-ci
2024-11-26 14:18:08 +02:00
Georgi Gerganov
9fd8c2687f
server : add more information about error ( #10455 )
2024-11-25 22:28:59 +02:00