mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2025-01-11 04:50:26 +01:00
3fd62a6b1c
* py : type-check all Python scripts with Pyright * server-tests : use trailing slash in openai base_url * server-tests : add more type annotations * server-tests : strip "chat" from base_url in oai_chat_completions * server-tests : model metadata is a dict * ci : disable pip cache in type-check workflow The cache is not shared between branches, and it's 250MB in size, so it would become quite a big part of the 10GB cache limit of the repo. * py : fix new type errors from master branch * tests : fix test-tokenizer-random.py Apparently, gcc applies optimisations even when pre-processing, which confuses pycparser. * ci : only show warnings and errors in python type-check The "information" level otherwise has entries from 'examples/pydantic_models_to_grammar.py', which could be confusing for someone trying to figure out what failed, considering that these messages can safely be ignored even though they look like errors.
Server tests
Python based server tests scenario using BDD and behave:
- issues.feature Pending issues scenario
- parallel.feature Scenario involving multi slots and concurrent requests
- security.feature Security, CORS and API Key
- server.feature Server base scenario: completion, embedding, tokenization, etc...
Tests target GitHub workflows job runners with 4 vCPU.
Requests are using aiohttp, asyncio based http client.
Note: If the host architecture inference speed is faster than GitHub runners one, parallel scenario may randomly fail.
To mitigate it, you can increase values in n_predict
, kv_size
.
Install dependencies
pip install -r requirements.txt
Run tests
- Build the server
cd ../../..
cmake -B build -DLLAMA_CURL=ON
cmake --build build --target llama-server
- Start the test:
./tests.sh
It's possible to override some scenario steps values with environment variables:
variable | description |
---|---|
PORT |
context.server_port to set the listening port of the server during scenario, default: 8080 |
LLAMA_SERVER_BIN_PATH |
to change the server binary path, default: ../../../build/bin/llama-server |
DEBUG |
"ON" to enable steps and server verbose mode --verbose |
SERVER_LOG_FORMAT_JSON |
if set switch server logs to json format |
N_GPU_LAYERS |
number of model layers to offload to VRAM -ngl --n-gpu-layers |
Run @bug, @wip or @wrong_usage annotated scenario
Feature or Scenario must be annotated with @llama.cpp
to be included in the default scope.
@bug
annotation aims to link a scenario with a GitHub issue.@wrong_usage
are meant to show user issue that are actually an expected behavior@wip
to focus on a scenario working in progress@slow
heavy test, disabled by default
To run a scenario annotated with @bug
, start:
DEBUG=ON ./tests.sh --no-skipped --tags bug --stop
After changing logic in steps.py
, ensure that @bug
and @wrong_usage
scenario are updated.
./tests.sh --no-skipped --tags bug,wrong_usage || echo "should failed but compile"