mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-12-25 22:08:46 +01:00
fixed typo (#178)
This commit is contained in:
parent
2d15d6c9a9
commit
27944c4206
@ -199,7 +199,7 @@ https://user-images.githubusercontent.com/271616/225014776-1d567049-ad71-4ef2-b0
|
|||||||
- We don't know yet how much the quantization affects the quality of the generated text
|
- We don't know yet how much the quantization affects the quality of the generated text
|
||||||
- Probably the token sampling can be improved
|
- Probably the token sampling can be improved
|
||||||
- The Accelerate framework is actually currently unused since I found that for tensor shapes typical for the Decoder,
|
- The Accelerate framework is actually currently unused since I found that for tensor shapes typical for the Decoder,
|
||||||
there is no benefit compared to the ARM_NEON intrinsics implementation. Of course, it's possible that I simlpy don't
|
there is no benefit compared to the ARM_NEON intrinsics implementation. Of course, it's possible that I simply don't
|
||||||
know how to utilize it properly. But in any case, you can even disable it with `LLAMA_NO_ACCELERATE=1 make` and the
|
know how to utilize it properly. But in any case, you can even disable it with `LLAMA_NO_ACCELERATE=1 make` and the
|
||||||
performance will be the same, since no BLAS calls are invoked by the current implementation
|
performance will be the same, since no BLAS calls are invoked by the current implementation
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user