From 4b0eff3df58d8d86e47348fb73d54da3194d416d Mon Sep 17 00:00:00 2001 From: Ujjawal Panchal <31011628+Ujjawal-K-Panchal@users.noreply.github.com> Date: Thu, 25 Jul 2024 13:43:27 +0530 Subject: [PATCH] docs : Quantum -> Quantized (#8666) * docfix: imatrix readme, quantum models -> quantized models. * docfix: server readme: quantum models -> quantized models. --- examples/imatrix/README.md | 2 +- examples/server/README.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/imatrix/README.md b/examples/imatrix/README.md index 29602881a..bb5faec94 100644 --- a/examples/imatrix/README.md +++ b/examples/imatrix/README.md @@ -1,6 +1,6 @@ # llama.cpp/examples/imatrix -Compute an importance matrix for a model and given text dataset. Can be used during quantization to enchance the quality of the quantum models. +Compute an importance matrix for a model and given text dataset. Can be used during quantization to enchance the quality of the quantized models. More information is available here: https://github.com/ggerganov/llama.cpp/pull/4861 ## Usage diff --git a/examples/server/README.md b/examples/server/README.md index ff4074517..33a2b95cc 100644 --- a/examples/server/README.md +++ b/examples/server/README.md @@ -5,7 +5,7 @@ Fast, lightweight, pure C/C++ HTTP server based on [httplib](https://github.com/ Set of LLM REST APIs and a simple web front end to interact with llama.cpp. **Features:** - * LLM inference of F16 and quantum models on GPU and CPU + * LLM inference of F16 and quantized models on GPU and CPU * [OpenAI API](https://github.com/openai/openai-openapi) compatible chat completions and embeddings routes * Parallel decoding with multi-user support * Continuous batching