arg : add env variable for parallel (#9513)

* add env variable for parallel

* Update README.md with env:  LLAMA_ARG_N_PARALLEL
This commit is contained in:
Bert Wagner 2024-09-17 09:35:38 -04:00 committed by GitHub
parent 8344ef58f8
commit 8b836ae731
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 2 additions and 2 deletions

View File

@ -1312,7 +1312,7 @@ gpt_params_context gpt_params_parser_init(gpt_params & params, llama_example ex,
[](gpt_params & params, int value) { [](gpt_params & params, int value) {
params.n_parallel = value; params.n_parallel = value;
} }
)); ).set_env("LLAMA_ARG_N_PARALLEL"));
add_opt(llama_arg( add_opt(llama_arg(
{"-ns", "--sequences"}, "N", {"-ns", "--sequences"}, "N",
format("number of sequences to decode (default: %d)", params.n_sequences), format("number of sequences to decode (default: %d)", params.n_sequences),

View File

@ -87,7 +87,7 @@ The project is under active development, and we are [looking for feedback and co
| `-ctk, --cache-type-k TYPE` | KV cache data type for K (default: f16) | | `-ctk, --cache-type-k TYPE` | KV cache data type for K (default: f16) |
| `-ctv, --cache-type-v TYPE` | KV cache data type for V (default: f16) | | `-ctv, --cache-type-v TYPE` | KV cache data type for V (default: f16) |
| `-dt, --defrag-thold N` | KV cache defragmentation threshold (default: -1.0, < 0 - disabled)<br/>(env: LLAMA_ARG_DEFRAG_THOLD) | | `-dt, --defrag-thold N` | KV cache defragmentation threshold (default: -1.0, < 0 - disabled)<br/>(env: LLAMA_ARG_DEFRAG_THOLD) |
| `-np, --parallel N` | number of parallel sequences to decode (default: 1) | | `-np, --parallel N` | number of parallel sequences to decode (default: 1)<br/>(env: LLAMA_ARG_N_PARALLEL) |
| `-cb, --cont-batching` | enable continuous batching (a.k.a dynamic batching) (default: enabled)<br/>(env: LLAMA_ARG_CONT_BATCHING) | | `-cb, --cont-batching` | enable continuous batching (a.k.a dynamic batching) (default: enabled)<br/>(env: LLAMA_ARG_CONT_BATCHING) |
| `-nocb, --no-cont-batching` | disable continuous batching<br/>(env: LLAMA_ARG_NO_CONT_BATCHING) | | `-nocb, --no-cont-batching` | disable continuous batching<br/>(env: LLAMA_ARG_NO_CONT_BATCHING) |
| `--mlock` | force system to keep model in RAM rather than swapping or compressing | | `--mlock` | force system to keep model in RAM rather than swapping or compressing |