-`-c N`, `--ctx-size N`: Deprecated, use `--kv-size` instead.
-`-kv N`, `--kv-size N`: Specify the total size of the KV cache for the prompt context. The default is 512, but LLaMA models were built with a context of 2048, which will provide better results for longer input/inference.
The `infill` program provides several ways to interact with the LLaMA models using input prompts:
-`--in-prefix PROMPT_BEFORE_CURSOR`: Provide the prefix directly as a command-line option.
-`--in-suffix PROMPT_AFTER_CURSOR`: Provide the suffix directly as a command-line option.
-`--interactive-first`: Run the program in interactive mode and wait for input right away. (More on this below.)
## Interaction
The `infill` program offers a seamless way to interact with LLaMA models, allowing users to receive real-time infill suggestions. The interactive mode can be triggered using `--interactive`, and `--interactive-first`
### Interaction Options
-`-i, --interactive`: Run the program in interactive mode, allowing users to get real time code suggestions from model.
-`--interactive-first`: Run the program in interactive mode and immediately wait for user input before starting the text generation.
-`--color`: Enable colorized output to differentiate visually distinguishing between prompts, user input, and generated text.