llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-27 12:33:06 +01:00

Author	SHA1	Message	Date
CentricStorm	4b4d92b098	docs: fix server documentation formatting (#10776 )	2024-12-11 11:47:43 +01:00
Yüg	a86ad841f1	server : add flag to disable the web-ui (#10762 ) (#10751 ) Co-authored-by: eugenio.segala <esegala@deloitte.co.uk>	2024-12-10 18:22:34 +01:00
Xuan Son Nguyen	ce8784bdb1	server : fix format_infill (#10724 ) * server : fix format_infill * fix * rename * update test * use another model * update test * update test * test_invalid_input_extra_req	2024-12-08 23:04:29 +01:00
Xuan Son Nguyen	e52522b869	server : bring back info of final chunk in stream mode (#10722 ) * server : bring back into to final chunk in stream mode * clarify a bit * traling space	2024-12-08 20:38:51 +01:00
Xuan Son Nguyen	3573fa8e7b	server : (refactor) no more json in server_task input (#10691 ) * server : (refactor) no more json in server_task input * add test for slots endpoint * add tests for /props and /slots * remove task inf_type * fix CI by adding safe_json_to_str * add "model_path" to /props * update readme	2024-12-07 20:21:09 +01:00
Georgi Gerganov	ce4a7b8493	server : various fixes (#10704 ) * server : various fixes ggml-ci * server : show curent seed in slot_params ggml-ci * fix /slots endpoint * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server : reflect endpoint response changes in the readme ggml-ci --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-12-07 18:02:05 +02:00
Georgi Gerganov	c2a16c0bdb	server : fix free of spec context and batch (#10651 ) ggml-ci	2024-12-07 11:52:44 +02:00
Xuan Son Nguyen	6c5bc0625f	server : (refactoring) do not rely on JSON internally (#10643 ) * server : (refactoring) reduce usage of json internally * move all response types to struct * wip [no ci] * many fixes * add virtual function * fix index * minor style fix * add std::move * refactor handle_completions_generic * add virtual functions * remove server.hpp * clarify server_sent_event RFC specs * apply review comments * fix model_alias and completion_probabilities * small clean up * remove virtual for to_json_oai_compat() * naming oai_compat --> oaicompat * fix unwanted recursive call * update docs	2024-12-06 11:14:32 +01:00
Plamen Minev	7736837d62	fix(server) : not show alert when DONE is received (#10674 )	2024-12-05 22:36:41 +01:00
Georgi Gerganov	1da7b76569	server : fix speculative decoding with context shift (#10641 ) * server : fix speculative decoding with context shift ggml-ci * server : take into account speculative limits ggml-ci * server : add tests	2024-12-04 22:38:20 +02:00
Xuan Son Nguyen	91c36c269b	server : (web ui) Various improvements, now use vite as bundler (#10599 ) * hide buttons in dropdown menu * use npm as deps manager and vite as bundler * fix build * fix build (2) * fix responsive on mobile * fix more problems on mobile * sync build * (test) add CI step for verifying build * fix ci * force rebuild .hpp files * cmake: clean up generated files pre build	2024-12-03 19:38:44 +01:00
Nikolaos Pothitos	82bca2257b	readme : add option, update default value, fix formatting (#10271 ) * readme : document --no-display-prompt * readme : update default prompt context size * readme : remove unnecessary indentation Indenting a line with four spaces makes Markdown treat that section as plain text. * readme : indent commands under bullets * readme : indent commands in lettered list	2024-12-03 12:50:08 +02:00
Georgi Gerganov	70b98fadbc	server : fix default draft model parameters (#10586 ) * server : force F16 KV cache for the draft model ggml-ci * server : fix draft params ggml-ci * server : various params fixes ggml-ci	2024-12-03 11:20:00 +02:00
Xuan Son Nguyen	642330ac7c	llama : add enum for built-in chat templates (#10623 ) * llama : add enum for supported chat templates * use "built-in" instead of "supported" * arg: print list of built-in templates * fix test * update server README	2024-12-02 22:10:19 +01:00
Georgi Gerganov	8648c52101	make : deprecate (#10514 ) * make : deprecate ggml-ci * ci : disable Makefile builds ggml-ci * docs : remove make references [no ci] * ci : disable swift build ggml-ci * docs : remove obsolete make references, scripts, examples ggml-ci * basic fix for compare-commits.sh * update build.md * more build.md updates * more build.md updates * more build.md updates * Update Makefile Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-12-02 21:22:53 +02:00
haopeng	64ed2091b2	server: Add "tokens per second" information in the backend (#10548 ) * add cmake rvv support * add timings * remove space * update readme * fix * fix code * remove empty line * add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-12-02 14:45:54 +01:00
alek3y	86dc11c5bc	server : bind to any port when specified (#10590 )	2024-12-01 13:33:12 +02:00
Diego Devesa	7cc2d2c889	ggml : move AMX to the CPU backend (#10570 ) * ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-29 21:54:58 +01:00
Xuan Son Nguyen	b782e5c7d4	server : add more test cases (#10569 ) * server : add split model test * add test speculative * add invalid cases	2024-11-29 21:48:56 +01:00
Xuan Son Nguyen	6c59567689	server : (tests) don't use thread for capturing stdout/stderr, bump openai client library (#10568 ) * server : (tests) don't use thread for capturing stdout/stderr * test: bump openai to 1.55.2 * bump openai to 1.55.3	2024-11-28 19:17:49 +01:00
Xuan Son Nguyen	9f912511bc	common : fix duplicated file name with hf_repo and hf_file (#10550 )	2024-11-27 22:30:52 +01:00
Xuan Son Nguyen	45abe0f74e	server : replace behave with pytest (#10416 ) * server : replace behave with pytest * fix test on windows * misc * add more tests * more tests * styling * log less, fix embd test * added all sequential tests * fix coding style * fix save slot test * add parallel completion test * fix parallel test * remove feature files * update test docs * no cache_prompt for some tests * add test_cache_vs_nocache_prompt	2024-11-26 16:20:18 +01:00
Georgi Gerganov	84e1c33cde	server : fix parallel speculative decoding (#10513 ) ggml-ci	2024-11-26 13:36:40 +02:00
Georgi Gerganov	47f931c8f9	server : enable cache_prompt by default (#10501 ) ggml-ci	2024-11-25 21:50:07 +02:00
Diego Devesa	10bce0450f	llama : accept a list of devices to use to offload a model (#10497 ) * llama : accept a list of devices to use to offload a model * accept `--dev none` to completely disable offloading * fix dev list with dl backends * rename env parameter to LLAMA_ARG_DEVICE for consistency	2024-11-25 19:30:06 +01:00
brucepro	a9a678a6b2	Add download chat feature to server chat (#10481 ) * Add download chat feature to server chat Add a download feature next to the delete chat feature in the server vue chat interface. * code style --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-11-25 17:11:55 +01:00
Georgi Gerganov	9ca2e67762	server : add speculative decoding support (#10455 ) * server : add speculative decoding support ggml-ci * server : add helper function slot.can_speculate() ggml-ci	2024-11-25 16:31:38 +02:00
Georgi Gerganov	d9d54e498d	speculative : refactor and add a simpler example (#10362 ) * speculative : refactor and add a simpler example ggml-ci * speculative : clean-up and add comments and TODOs [no ci] * speculative : manage context in common_speculative ggml-ci * speculative : simplify ggml-ci * speculative : simplify (cont) ggml-ci * speculative : add --draft-min CLI arg * speculative : minor fixup * make : build fixes * speculative : do not redraft previous drafts ggml-ci * speculative : fix the draft sampling ggml-ci * speculative : fix compile warning * common : refactor args ggml-ci * common : change defaults [no ci] * common : final touches ggml-ci	2024-11-25 09:58:41 +02:00
Johannes Gäßler	4e54be0ec6	llama/ex: remove --logdir argument (#10339 )	2024-11-16 23:00:41 +01:00
MaggotHATE	bcdb7a2386	server: (web UI) Add samplers sequence customization (#10255 ) * Samplers sequence: simplified and input field. * Removed unused function * Modify and use `settings-modal-short-input` * rename "name" --> "label" --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-11-16 14:26:54 +01:00
Xuan Son Nguyen	9901068ac7	server : (web UI) add copy button for code block, fix api key (#10242 ) * server : (web ui) add copy btn for code blocks * fix problem with api key * use settings-modal-short-input component * always show copy btn for code snippet	2024-11-15 10:48:49 +01:00
Alexey Parfenov	ff7fb670d0	server : add missing docs (#10269 )	2024-11-13 13:16:30 +02:00
Jhen-Jie Hong	0e712a5acb	server : fix incorrect res in validate_model_chat_template (#10272 ) * server : fix validate_model_chat_template * server : fix chat res	2024-11-13 13:15:23 +02:00
Georgi Gerganov	b141e5f6ef	server : enable KV cache defrag by default (#10233 ) ggml-ci	2024-11-11 08:38:43 +02:00
MaggotHATE	505f33274d	server : (web UI) Add back sampler settings (#10239 ) * Add back samplers to server * Added tooltips with basic information * Fixed stretching of input fields. * use component for settings input, move help msg to tooltips --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-11-10 15:42:25 -04:00
Xuan Son Nguyen	76c6e7f105	server : minor UI fix (#10207 )	2024-11-07 18:44:38 -04:00
Xuan Son Nguyen	a71d81cf8c	server : revamp chat UI with vuejs and daisyui (#10175 ) * server : simple chat UI with vuejs and daisyui * move old files to legacy folder * embed deps into binary * basic markdown support * add conversation history, save to localStorage * fix bg-base classes * save theme preferences * fix tests * regenerate, edit, copy buttons * small fixes * docs: how to use legacy ui * better error handling * make CORS preflight more explicit * add GET method for CORS * fix tests * clean up a bit * better auto scroll * small fixes * use collapse-arrow * fix closeAndSaveConfigDialog * small fix * remove console.log * fix style for <pre> element * lighter bubble color (less distract when reading)	2024-11-07 17:31:10 -04:00
Georgi Gerganov	b11f9ba9b8	server : remove hack for extra parallel slot (#10187 ) ggml-ci	2024-11-06 13:29:01 +02:00
Xuan Son Nguyen	9e0ecfb697	server : clarify /slots endpoint, add is_processing (#10162 ) * server : clarify /slots endpoint, add is_processing * fix tests	2024-11-04 16:33:29 +01:00
sasha0552	42cadc74bd	server : fix slot selection by lru (#10126 ) * server : fix slot selection by lru, migrate lcs to `size_t` * minor debug log fix	2024-11-02 18:34:56 +02:00
Georgi Gerganov	45950415ed	server : fix endpoint checks (#10135 ) ggml-ci	2024-11-02 18:34:00 +02:00
sasha0552	d865d1478c	server : fix smart selection of available slot (#10120 ) * Fix smart selection of available slot * minor fix * replace vectors of tokens with shorthands	2024-11-01 14:33:14 +01:00
Kevin Gibbons	0a683e8088	server : include scheme when printing URL (#10106 )	2024-10-31 14:02:35 +01:00
Georgi Gerganov	8d8ff71536	llama : remove Tail-Free sampling (#10071 ) ggml-ci	2024-10-29 10:42:05 +02:00
Georgi Gerganov	8125e6cbfc	server : don't overfill the batch during infill (#10018 ) ggml-ci	2024-10-28 08:49:32 +02:00
wwoodsTM	ff252ea48e	llama : add DRY sampler (#9702 ) * sampling : add DRY sampler (post-refactor) * DRY: Trying to fix coauthors, removed unneeded line * DRY: Fixed redundant code * DRY: Fixed crash issue due to DRY being in chain but uninitialized --------- Co-authored-by: l3utterfly <gc.pthzfoldr@gmail.com> Co-authored-by: pi6am <34464159+pi6am@users.noreply.github.com>	2024-10-25 19:07:34 +03:00
Michael Podvitskiy	d80fb71f8b	llama: string_split fix (#10022 ) * llama: Refactor string_split to use template specialization, fixes parsing strings with spaces * llama: Add static_assert in the string_split template to ensure the correct template specialization is used for std::string	2024-10-25 17:57:54 +02:00
Georgi Gerganov	bc5ba007b2	server : check that the prompt fits in the slot's context (#10030 ) ggml-ci	2024-10-25 10:13:46 +03:00
Xuan Son Nguyen	958367bf53	server : refactor slot input data, move tokenizer to HTTP thread (#10023 ) * server : refactor slot input data, move tokenizer to HTTP thread * move prompt_tokens.empty() check * fix incorrect if branch * fix infinite generation loop * bring back infill validation * add infill test * try fixing format_infill * fix test * remove redundant code * rename completion to inference * update docs * use llama_tokens everywhere	2024-10-24 21:51:22 +02:00
wwoodsTM	0a1c750c80	server : samplers accept the prompt correctly (#10019 )	2024-10-23 22:27:51 +03:00

1 2 3 4 5 ...

467 Commits