llama.cpp/examples/server/tests/features/passkey.feature

# run with: ./tests.sh --no-skipped --tags passkey
@passkey
@slow
Feature: Passkey / Self-extend with context shift

  Background: Server startup
    Given a server listening on localhost:8080

  # Generates a long text of junk and inserts a secret passkey number inside it.
  # Then we query the LLM for the secret passkey.
  # see #3856 and #4810
  Scenario Outline: Passkey
    Given a model file <hf_file> from HF repo <hf_repo>
    And   <n_batch> as batch size
    And   <n_junk> as number of junk
    And   <n_predicted> server max tokens to predict
    And   42 as seed
    And   <n_ctx> KV cache size
    And   1 slots
    And   <n_ga> group attention factor to extend context size through self-extend
    And   <n_ga_w> group attention width to extend context size through self-extend
    # Can be override with N_GPU_LAYERS
    And   <ngl> GPU offloaded layers
    Then  the server is starting
    Then  the server is healthy
    Given available models
    Then  model 0 is trained on <n_ctx_train> tokens context
    Given a prefix prompt:
    """
    here is an important info hidden inside a lot of irrelevant text. Find it and memorize them. I will quiz you about the important information there.
    """
    And a passkey prompt template:
    """
    The pass key is <passkey> Remember it. <passkey> is the pass key.
    """
    And a junk suffix prompt:
    """
    The grass is green. The sky is blue. The sun is yellow. Here we go. There and back again.
    """
    And a suffix prompt:
    """
    What is the pass key? The pass key is
    """
    Given a "<passkey>" passkey challenge prompt with the passkey inserted every <i_pos> junk
    And  a completion request with no api error
    Then <n_predicted> tokens are predicted matching <re_content>

    Examples:
      | hf_repo                         | hf_file                     | n_ctx_train | ngl | n_ctx | n_batch | n_ga | n_ga_w | n_junk | i_pos | passkey | n_predicted | re_content     |
      | TheBloke/phi-2-GGUF             | phi-2.Q4_K_M.gguf           | 2048        | 5   | 8192  | 512     | 4    | 512    | 250    | 50    | 42      | 1           | 42             |
      | TheBloke/phi-2-GGUF             | phi-2.Q4_K_M.gguf           | 2048        | 5   | 8192  | 512     | 2    | 512    | 250    | 50    | 42      | 1           | \b((?!42)\w)+\b  |
      #| TheBloke/Llama-2-7B-GGUF        | llama-2-7b.Q2_K.gguf        | 4096        | 3   | 16384 | 512     | 4    | 512    | 500    | 300   | 1234    | 5           | 1234           |
      #| TheBloke/Mixtral-8x7B-v0.1-GGUF | mixtral-8x7b-v0.1.Q2_K.gguf | 32768       | 2   | 16384 | 512     | 4    | 512    | 500    | 100   | 0987    | 5           | 0
      # 987           |
server: tests: passkey challenge / self-extend with context shift demo (#5832) * server: tests: add models endpoint scenario * server: /v1/models add some metadata * server: tests: add debug field in context before scenario * server: tests: download model from HF, add batch size * server: tests: add passkey test * server: tests: add group attention params * server: do not truncate prompt tokens if self-extend through group attention is enabled * server: logs: do not truncate log values * server: tests - passkey - first good working value of nga * server: tests: fix server timeout * server: tests: fix passkey, add doc, fix regex content matching, fix timeout * server: tests: fix regex content matching * server: tests: schedule slow tests on master * server: metrics: fix when no prompt processed * server: tests: self-extend add llama-2-7B and Mixtral-8x7B-v0.1 * server: tests: increase timeout for completion * server: tests: keep only the PHI-2 test * server: tests: passkey add a negative test 2024-03-02 22:00:14 +01:00			`# run with: ./tests.sh --no-skipped --tags passkey`
			`@passkey`
			`@slow`
			`Feature: Passkey / Self-extend with context shift`

			`Background: Server startup`
			`Given a server listening on localhost:8080`

			`# Generates a long text of junk and inserts a secret passkey number inside it.`
			`# Then we query the LLM for the secret passkey.`
			`# see #3856 and #4810`
			`Scenario Outline: Passkey`
			`Given a model file <hf_file> from HF repo <hf_repo>`
			`And <n_batch> as batch size`
			`And <n_junk> as number of junk`
			`And <n_predicted> server max tokens to predict`
			`And 42 as seed`
			`And <n_ctx> KV cache size`
			`And 1 slots`
			`And <n_ga> group attention factor to extend context size through self-extend`
			`And <n_ga_w> group attention width to extend context size through self-extend`
			`# Can be override with N_GPU_LAYERS`
			`And <ngl> GPU offloaded layers`
			`Then the server is starting`
			`Then the server is healthy`
			`Given available models`
			`Then model 0 is trained on <n_ctx_train> tokens context`
			`Given a prefix prompt:`
			`"""`
			`here is an important info hidden inside a lot of irrelevant text. Find it and memorize them. I will quiz you about the important information there.`
			`"""`
			`And a passkey prompt template:`
			`"""`
			`The pass key is <passkey> Remember it. <passkey> is the pass key.`
			`"""`
			`And a junk suffix prompt:`
			`"""`
			`The grass is green. The sky is blue. The sun is yellow. Here we go. There and back again.`
			`"""`
			`And a suffix prompt:`
			`"""`
			`What is the pass key? The pass key is`
			`"""`
			`Given a "<passkey>" passkey challenge prompt with the passkey inserted every <i_pos> junk`
			`And a completion request with no api error`
			`Then <n_predicted> tokens are predicted matching <re_content>`

			`Examples:`
			`\| hf_repo \| hf_file \| n_ctx_train \| ngl \| n_ctx \| n_batch \| n_ga \| n_ga_w \| n_junk \| i_pos \| passkey \| n_predicted \| re_content \|`
			`\| TheBloke/phi-2-GGUF \| phi-2.Q4_K_M.gguf \| 2048 \| 5 \| 8192 \| 512 \| 4 \| 512 \| 250 \| 50 \| 42 \| 1 \| 42 \|`
			`\| TheBloke/phi-2-GGUF \| phi-2.Q4_K_M.gguf \| 2048 \| 5 \| 8192 \| 512 \| 2 \| 512 \| 250 \| 50 \| 42 \| 1 \| \b((?!42)\w)+\b \|`
			`#\| TheBloke/Llama-2-7B-GGUF \| llama-2-7b.Q2_K.gguf \| 4096 \| 3 \| 16384 \| 512 \| 4 \| 512 \| 500 \| 300 \| 1234 \| 5 \| 1234 \|`
			`#\| TheBloke/Mixtral-8x7B-v0.1-GGUF \| mixtral-8x7b-v0.1.Q2_K.gguf \| 32768 \| 2 \| 16384 \| 512 \| 4 \| 512 \| 500 \| 100 \| 0987 \| 5 \| 0`
			`# 987 \|`