* server : replace behave with pytest
* fix test on windows
* misc
* add more tests
* more tests
* styling
* log less, fix embd test
* added all sequential tests
* fix coding style
* fix save slot test
* add parallel completion test
* fix parallel test
* remove feature files
* update test docs
* no cache_prompt for some tests
* add test_cache_vs_nocache_prompt
* server: tests: add models endpoint scenario
* server: /v1/models add some metadata
* server: tests: add debug field in context before scenario
* server: tests: download model from HF, add batch size
* server: tests: add passkey test
* server: tests: add group attention params
* server: do not truncate prompt tokens if self-extend through group attention is enabled
* server: logs: do not truncate log values
* server: tests - passkey - first good working value of nga
* server: tests: fix server timeout
* server: tests: fix passkey, add doc, fix regex content matching, fix timeout
* server: tests: fix regex content matching
* server: tests: schedule slow tests on master
* server: metrics: fix when no prompt processed
* server: tests: self-extend add llama-2-7B and Mixtral-8x7B-v0.1
* server: tests: increase timeout for completion
* server: tests: keep only the PHI-2 test
* server: tests: passkey add a negative test
* server: tests: init scenarios
- health and slots endpoints
- completion endpoint
- OAI compatible chat completion requests w/ and without streaming
- completion multi users scenario
- multi users scenario on OAI compatible endpoint with streaming
- multi users with total number of tokens to predict exceeds the KV Cache size
- server wrong usage scenario, like in Infinite loop of "context shift" #3969
- slots shifting
- continuous batching
- embeddings endpoint
- multi users embedding endpoint: Segmentation fault #5655
- OpenAI-compatible embeddings API
- tokenize endpoint
- CORS and api key scenario
* server: CI GitHub workflow
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>