mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-12-27 22:59:24 +01:00
958367bf53
* server : refactor slot input data, move tokenizer to HTTP thread * move prompt_tokens.empty() check * fix incorrect if branch * fix infinite generation loop * bring back infill validation * add infill test * try fixing format_infill * fix test * remove redundant code * rename completion to inference * update docs * use llama_tokens everywhere |
||
---|---|---|
.. | ||
steps | ||
ctx_shift.feature | ||
embeddings.feature | ||
environment.py | ||
infill.feature | ||
issues.feature | ||
lora.feature | ||
parallel.feature | ||
passkey.feature | ||
rerank.feature | ||
results.feature | ||
security.feature | ||
server.feature | ||
slotsave.feature | ||
wrong_usages.feature |