* metal : add quantized FA (vec) support
ggml-ci
* metal : add quantized FA (non-vec) support
* metal : fix support check
ggml-ci
* metal : clean-up
* metal : clean-up (cont)
* metal : fix shared memory calc + reduce smem + comments
* metal : float-correctness
* metal : minor [no ci]
* q6_k instruction reordering attempt
* better subtract method
* should be theoretically faster
small improvement with shuffle lut, likely because all loads are already done at that stage
* optimize bit fiddling
* handle -32 offset separately. bsums exists for a reason!
* use shift
* Update ggml-quants.c
* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86
* convert-lora : make `--base` optional
* lint
* handle case where base_model_name_or_path is invalid
* do not include metadata from base model
* clarify unspecified --base
* add small comment [no ci]
* trigger ci
* llama : fix buffer checks for mamba and rwk
* llama : fix missing worst case flag during reserve
* cuda : fix supports_op for norm
* disable sched SET_CAUSE
* ggml : fix gguf string leak when reading kv pairs fails
* ggml : avoid crashing with GGML_ABORT when the KV has an invalid type
* ggml : avoid crashing on failed memory allocations when loading a gguf file
* ggml: Add POOL2D OP for GPU ACC to the Vulkan.
- The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend.
- A GGML_OP_POOL_2D shader has been added. (Pooling)
- The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
* [fix] Correct the incorrect order of the parameters.
fix casting to int.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
---------
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
* Add granite template to llama.cpp
* Add granite template to test-chat-template.cpp
* Update src/llama.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* Update tests/test-chat-template.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* Added proper template and expected output
* Small change to \n
Small change to \n
* Add code space &
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* Fix spacing
* Apply suggestions from code review
* Update src/llama.cpp
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>