* vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl
Shaders are based on cpy.cu.
* vulkan: support copy from q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl to f32
* ggml: copy q->f32 assumes some contiguity in the destination
* Add SVE support for q4_K_q8_K
* Update ggml/src/ggml-cpu/ggml-cpu-quants.c
change to use K_SCALE_SIZE
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* q6_k scale caching
* 16 bit unpack
* q4_k test (slow)
* revert it
* q3_k
* q2_k
* little stuff
* try precalculating products of a and q2_k scales
* Revert "try precalculating products of a and q2_k scales"
This reverts commit 65110b81f23f66331a50c6e889a7c1ab9470a86b.
* unpack should be u16, add vim swap to gitignore (about time)
* better q4_k scales
* q5_k
* better q6_k with separate paths for all threads and partial threads in use, plus some more optimizations
* q2_k better dequant
* q3_k optimizations
* q3_k use hmask simd from cpu avx version
* make the caches happy
* q3_k separate out calculation
* q2_k separate out
* little stuff
* use calc_superblock everywhere
* q2_k optimize scale calculation
* more barriers
* fix: ggml: fix vulkan-shaders-gen build
The vulkan-shaders-gen target was not being built correctly
in case of cross-compilation.
Other outputs need to be built for the cross compile target,
but vulkan-shaders-gen needs to be built for the host.
* refactor: ggml: Improve vulkan-shaders-gen toolchain setup
- Add GGML_SHADERS_GEN_TOOLCHAIN CMake option.
- Auto-detect host toolchain if not set.
* refactor: ggml: Improve vulkan-shaders-gen toolchain setup
Use configure_file to generate host_toolchain.cmake from template
* fix: ggml: Fix compile error
Fix compile error not finding vulkan-shaders-gen
* fix: vulkan-shaders-gen build and path handling
Fix build issues with vulkan-shaders-gen:
- Add target dependency for correct build order
- Use CMAKE_HOST_SYSTEM_NAME for executable suffix
- Fix MSVC output directory in host toolchain
- Normalize path handling for cross-compilation
* fix: improve host compiler detection in vulkan shader build
Improve host compiler detection for vulkan shader generation:
- Add NO_CMAKE_FIND_ROOT_PATH to all compiler searches
- Consolidate compiler detection logic
- Fix Windows-specific MSVC detection
- Ensure correct compiler search in cross-compilation
* refactor: Simplify CMake function for detecting host compiler
Simplified the CMake function to improve the process of detecting the host compiler.
* fix: Remove unnecessary Vulkan library linkage in CMakeLists.txt
Since `vulkan-shader-gen.cpp` only requires the `glslc` executable
and not the Vulkan headers or libraries, CMakeLists.txt needs to
be corrected.
(See: ecc93d0558)
* refactor: Rename host_toolchain.cmake.in
- Rename host_toolchain.cmake.in to cmake/host-toolchain.cmake.in
* refactor: GGML_VULKAN_SHADERS_GEN_TOOLCHAIN
Rename the macro GGML_SHADERS_GEN_TOOLCHAIN to GGML_VULKAN_SHADERS_GEN_TOOLCHAIN
This commit contains a suggestion for adding the missing embd_to_audio
function from tts.cpp to tts-outetts.py. This introduces a depencency
numpy which I was not sure if that is acceptable or not (only PyTorch
was mentioned in referened PR).
Also the README has been updated with instructions to run the example
with llama-server and the python script.
Refs: https://github.com/ggerganov/llama.cpp/pull/10784#issuecomment-2548377734
* cli : auto activate conversation mode if chat template is detected
* add warn on bad template
* update readme (writing with the help of chatgpt)
* update readme (2)
* do not activate -cnv for non-instruct models
* Refactor: Moves cuda graph executable update step to separate function.
* Refactor: Moves cuda graph update check to separate function.
* Refactor: Moves cuda graph maintenance (update or adjusting copy parameters) to separate function for improved readability.
* Fix: Adds missing reference to maintain_cuda_graph() definition.
* Refactor: Improves structure and abstractions by moving CUDA graph evaluation and capture to its own function.
* Refactor: Moves node graph checks and copy ops into individual function for improved readability.
* Refactor: Removes code permanently excluded from compilation to increase readability.
* Style: Adds missing newline
* Style: Consolidates several neighboring '#ifdef USE_CUDA_GRAPH' into a single one
* Refactor: Makes 'cuda_graph_update_required' a local variable
* remove double lines between functions
---------
Co-authored-by: slaren <slarengh@gmail.com>
* common : support tag-based hf_repo like on ollama
* fix build
* various fixes
* small fixes
* fix style
* fix windows build?
* move common_get_hf_file to common.cpp
* fix complain with noreturn
This commit removes the 'd' from the log message in llama-vocab.cpp
when logging a bad special token.
The motivation for this is that currently the output can look something
like the following:
```console
load: bad special token:
'tokenizer.ggml.image_token_id' = 128256d, using default id -1
```
Build fails when using HIP and GGML_BACKEND_DL:
```
/usr/bin/ld: ../ggml/src/libggml.so: undefined reference to `ggml_backend_cuda_reg'
collect2: error: ld returned 1 exit status
```
This patch fixes this.
* examples : add README.md to tts example [no ci]
* squash! examples : add README.md to tts example [no ci]
Fix heading to be consistent with other examples, and add a quickstart
section to README.md.
* squash! examples : add README.md to tts example [no ci]
Fix spelling mistake.