* server : replace behave with pytest
* fix test on windows
* misc
* add more tests
* more tests
* styling
* log less, fix embd test
* added all sequential tests
* fix coding style
* fix save slot test
* add parallel completion test
* fix parallel test
* remove feature files
* update test docs
* no cache_prompt for some tests
* add test_cache_vs_nocache_prompt
* llama : accept a list of devices to use to offload a model
* accept `--dev none` to completely disable offloading
* fix dev list with dl backends
* rename env parameter to LLAMA_ARG_DEVICE for consistency
* Add download chat feature to server chat
Add a download feature next to the delete chat feature in the server vue chat interface.
* code style
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
* Samplers sequence: simplified and input field.
* Removed unused function
* Modify and use `settings-modal-short-input`
* rename "name" --> "label"
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
* server : (web ui) add copy btn for code blocks
* fix problem with api key
* use settings-modal-short-input component
* always show copy btn for code snippet
* Add back samplers to server
* Added tooltips with basic information
* Fixed stretching of input fields.
* use component for settings input, move help msg to tooltips
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
* server : simple chat UI with vuejs and daisyui
* move old files to legacy folder
* embed deps into binary
* basic markdown support
* add conversation history, save to localStorage
* fix bg-base classes
* save theme preferences
* fix tests
* regenerate, edit, copy buttons
* small fixes
* docs: how to use legacy ui
* better error handling
* make CORS preflight more explicit
* add GET method for CORS
* fix tests
* clean up a bit
* better auto scroll
* small fixes
* use collapse-arrow
* fix closeAndSaveConfigDialog
* small fix
* remove console.log
* fix style for <pre> element
* lighter bubble color (less distract when reading)
* llama: Refactor string_split to use template specialization, fixes parsing strings with spaces
* llama: Add static_assert in the string_split template to ensure the correct template specialization is used for std::string
Prior to this commit, using a JSON Schema containing a string
with `pattern` regular expression that uses top-level alternation
(e.g. `"pattern": "^A|B|C|D$"`) would result in invalid JSON
output from the constrained sampling grammar, because it
ended up creating a grammar rule like this for the string:
```
thing ::= "\"" "A" | "B" | "C" | "D" "\"" space
```
Note that this rule will only match a starting quote for the "A" case,
and will only match an ending quote for the "D" case,
so this rule will always produce invalid JSON when used for sampling
(that is, the JSON will always be lacking the starting quote,
the ending quote, or both).
This was fixed in a simple way by adding parentheses to the
generated rule (for all string pattern rules, to keep it simple),
such that the new generated rule looks like this (correct):
```
thing ::= "\"" ("A" | "B" | "C" | "D") "\"" space
```
* Initial XTC commit
Adds XTC sampler, not activated by default, but recommended settings by default.
* Cleanup
* Simplified chances calculation
To be more inline with the original implementation, chance is calculated once at the beginning.
* First fixes by comments
Still need to look into sorting
* Fixed trailing backspaces
* Fixed RNG to be reproduceable
Thanks to @slaren for directions
* Fixed forgotten header
* Moved `min_keep`
Moved from conditions to a simple check at the end.
* Fixed broken randomization
Thanks to @slaren for explanation
* Swapped sorting for a custom algorithm
Shifts tokens to remove the penalized ones, then puts the penalized at the back. Should make `min_keep` still viable.
* Algorithm rework
1. Scan token from top till the first non-penalizable
2. Remove the last captured token (the least probable above threshold)
3. Shift all tokens to override the remaining penalizable
4. Penalize and put them at the the bottom.
* Added XTC to `test-sampling`
* Simplified algorithm and more tests
* Updated info in common and args
* Merged back lost commits in common and arg
* Update dump info in common
* Fixed incorrect min_keep check
* Added XTC to README
* Renamed parameters, fixed info and defaults
* probability is at 0 by default, but XTC is included in sampling queue
* threshold higher than 0.5 switches XTC off
* Initial server support
* Added XTC to server UIs
* Fixed labels in old server UI
* Made algorithm safer and more readable
* Removed xtc_threshold_max
* Fixed arg after update
* Quick fixes by comments
* Simplified algorithm since threshold_max is removed
* Renamed random distribution
* Fixed tests and outdated README
* Small fixes
* server : accept extra_context for the infill endpoint
ggml-ci
* server : update readme [no ci]
* server : use repo-level FIM pattern if possible
ggml-ci
* llama : improve infill support
ggml-ci
* llama : add more FIM token strings
ggml-ci
* server : update prompt on slot restore (#9800)
* gguf : deprecate old FIM token KVs