mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2025-01-12 13:27:21 +01:00
update straggling refs
This commit is contained in:
parent
99df4cc091
commit
7fbe6006c9
2
.github/workflows/build.yml
vendored
2
.github/workflows/build.yml
vendored
@ -240,7 +240,7 @@ jobs:
|
|||||||
echo "Fetch llama2c model"
|
echo "Fetch llama2c model"
|
||||||
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories260K/stories260K.bin
|
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories260K/stories260K.bin
|
||||||
./bin/convert-llama2c-to-ggml --copy-vocab-from-model ./tok512.bin --llama2c-model stories260K.bin --llama2c-output-model stories260K.gguf
|
./bin/convert-llama2c-to-ggml --copy-vocab-from-model ./tok512.bin --llama2c-model stories260K.bin --llama2c-output-model stories260K.gguf
|
||||||
./bin/main -m stories260K.gguf -p "One day, Lily met a Shoggoth" -n 500 -c 256
|
./bin/llama -m stories260K.gguf -p "One day, Lily met a Shoggoth" -n 500 -c 256
|
||||||
|
|
||||||
- name: Determine tag name
|
- name: Determine tag name
|
||||||
id: tag
|
id: tag
|
||||||
|
@ -77,7 +77,7 @@ It has the similar design of other llama.cpp BLAS-based paths such as *OpenBLAS,
|
|||||||
*Notes:*
|
*Notes:*
|
||||||
|
|
||||||
- **Memory**
|
- **Memory**
|
||||||
- The device memory is a limitation when running a large model. The loaded model size, *`llm_load_tensors: buffer_size`*, is displayed in the log when running `./bin/main`.
|
- The device memory is a limitation when running a large model. The loaded model size, *`llm_load_tensors: buffer_size`*, is displayed in the log when running `./bin/llama`.
|
||||||
|
|
||||||
- Please make sure the GPU shared memory from the host is large enough to account for the model's size. For e.g. the *llama-2-7b.Q4_0* requires at least 8.0GB for integrated GPU and 4.0GB for discrete GPU.
|
- Please make sure the GPU shared memory from the host is large enough to account for the model's size. For e.g. the *llama-2-7b.Q4_0* requires at least 8.0GB for integrated GPU and 4.0GB for discrete GPU.
|
||||||
|
|
||||||
|
@ -27,10 +27,8 @@ To mitigate it, you can increase values in `n_predict`, `kv_size`.
|
|||||||
|
|
||||||
```shell
|
```shell
|
||||||
cd ../../..
|
cd ../../..
|
||||||
mkdir build
|
cmake -B build -DLLAMA_CURL=ON
|
||||||
cd build
|
cmake --build build --target llama-server
|
||||||
cmake -DLLAMA_CURL=ON ../
|
|
||||||
cmake --build . --target llama-server
|
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Start the test: `./tests.sh`
|
2. Start the test: `./tests.sh`
|
||||||
|
Loading…
x
Reference in New Issue
Block a user