2024-03-02 22:00:14 +01:00
|
|
|
# run with: ./tests.sh --no-skipped --tags wrong_usage
|
2024-02-24 12:28:55 +01:00
|
|
|
@wrong_usage
|
|
|
|
Feature: Wrong usage of llama.cpp server
|
|
|
|
|
|
|
|
#3969 The user must always set --n-predict option
|
|
|
|
# to cap the number of tokens any completion request can generate
|
|
|
|
# or pass n_predict/max_tokens in the request.
|
|
|
|
Scenario: Infinite loop
|
|
|
|
Given a server listening on localhost:8080
|
2024-03-02 22:00:14 +01:00
|
|
|
And a model file tinyllamas/stories260K.gguf from HF repo ggml-org/models
|
2024-09-02 17:11:51 +02:00
|
|
|
And 42 as server seed
|
|
|
|
And 2048 KV cache size
|
2024-02-24 12:28:55 +01:00
|
|
|
# Uncomment below to fix the issue
|
|
|
|
#And 64 server max tokens to predict
|
|
|
|
Then the server is starting
|
2024-09-02 17:11:51 +02:00
|
|
|
Then the server is healthy
|
2024-02-24 12:28:55 +01:00
|
|
|
Given a prompt:
|
|
|
|
"""
|
|
|
|
Go to: infinite loop
|
|
|
|
"""
|
|
|
|
# Uncomment below to fix the issue
|
|
|
|
#And 128 max tokens to predict
|
|
|
|
Given concurrent completion requests
|
2024-03-02 22:00:14 +01:00
|
|
|
Then the server is idle
|
2024-02-24 12:28:55 +01:00
|
|
|
Then all prompts are predicted
|