mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2025-01-06 02:48:57 +01:00
97bdd26eee
* lora: load to devide buft
* add patch tensor function
* correct tensor patch
* llama_lora_adapter_apply
* correct ggml_backend_tensor_copy
* add llm_build_mm
* fix auto merge
* update based on review comments
* add convert script
* no more transpose A
* add f16 convert
* add metadata check
* add sanity check
* fix ftype
* add requirements
* fix requirements
* fix outfile
* conversion: only allow selected models
* fix types
* cuda : do not use dmmv if the tensor does not have enough cols
* llama : lora fixes
* do not disable mmap with lora
Co-authored-by: slaren <slarengh@gmail.com>
* llm_build_lora_mm_id
* convert_lora : MoE LoRA conversion support
* convert_lora : prefer safetensors, similarly to convert_hf
* convert_hf : simplify modify_tensors for InternLM2
* convert_lora : lazy conversion
* llama : load and use alpha from LoRA adapters
* llama : use llm_build_lora_mm in most model graphs
* auto scale
* Revert "auto scale"
This reverts commit
|
||
---|---|---|
.. | ||
ggml-cuda | ||
ggml-sycl | ||
kompute@4565194ed7 | ||
kompute-shaders | ||
llamafile | ||
vulkan-shaders | ||
CMakeLists.txt | ||
ggml-aarch64.c | ||
ggml-aarch64.h | ||
ggml-alloc.c | ||
ggml-backend-impl.h | ||
ggml-backend.c | ||
ggml-blas.cpp | ||
ggml-common.h | ||
ggml-cuda.cu | ||
ggml-impl.h | ||
ggml-kompute.cpp | ||
ggml-metal.m | ||
ggml-metal.metal | ||
ggml-quants.c | ||
ggml-quants.h | ||
ggml-rpc.cpp | ||
ggml-sycl.cpp | ||
ggml-vulkan.cpp | ||
ggml.c |