llama.cpp/ggml/src
Xuan Son Nguyen 97bdd26eee
Refactor lora adapter support (#8332)
* lora: load to devide buft

* add patch tensor function

* correct tensor patch

* llama_lora_adapter_apply

* correct ggml_backend_tensor_copy

* add llm_build_mm

* fix auto merge

* update based on review comments

* add convert script

* no more transpose A

* add f16 convert

* add metadata check

* add sanity check

* fix ftype

* add requirements

* fix requirements

* fix outfile

* conversion: only allow selected models

* fix types

* cuda : do not use dmmv if the tensor does not have enough cols

* llama : lora fixes

* do not disable mmap with lora

Co-authored-by: slaren <slarengh@gmail.com>

* llm_build_lora_mm_id

* convert_lora : MoE LoRA conversion support

* convert_lora : prefer safetensors, similarly to convert_hf

* convert_hf : simplify modify_tensors for InternLM2

* convert_lora : lazy conversion

* llama : load and use alpha from LoRA adapters

* llama : use llm_build_lora_mm in most model graphs

* auto scale

* Revert "auto scale"

This reverts commit 42415a4874.

* remove redundant params

* Apply suggestions from code review

Co-authored-by: slaren <slarengh@gmail.com>

* change kv metadata

* move add_type to __init__

* convert_hf : move add_type to main()

* convert_lora : use the GGUFWriter from Model instead of overwriting it

---------

Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-07-15 20:50:47 +02:00
..
ggml-cuda cuda : suppress 'noreturn' warn in no_device_code (#8414) 2024-07-11 17:53:42 +02:00
ggml-sycl [SYCL] add concat through dim 1/2 (#8483) 2024-07-15 19:32:15 +08:00
kompute@4565194ed7 llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
kompute-shaders llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
llamafile ggml : move sgemm sources to llamafile subfolder (#8394) 2024-07-10 15:23:29 +03:00
vulkan-shaders Vulkan MMQ Fix (#8479) 2024-07-15 09:38:52 +02:00
CMakeLists.txt vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
ggml-aarch64.c ggml : suppress unknown pragma 'GCC' on windows (#8460) 2024-07-15 15:48:17 +03:00
ggml-aarch64.h ggml : minor naming changes (#8433) 2024-07-12 10:46:02 +03:00
ggml-alloc.c llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
ggml-backend-impl.h llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
ggml-backend.c [SYCL] fix the mul_mat_id ut issues (#8427) 2024-07-12 08:52:04 +08:00
ggml-blas.cpp ggml : add NVPL BLAS support (#8329) (#8425) 2024-07-11 18:49:15 +02:00
ggml-common.h ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780) 2024-07-10 15:14:51 +03:00
ggml-cuda.cu Refactor lora adapter support (#8332) 2024-07-15 20:50:47 +02:00
ggml-impl.h ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780) 2024-07-10 15:14:51 +03:00
ggml-kompute.cpp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
ggml-metal.m metal : template-ify some of the kernels (#8447) 2024-07-13 18:32:33 +03:00
ggml-metal.metal metal : template-ify some of the kernels (#8447) 2024-07-13 18:32:33 +03:00
ggml-quants.c ggml : minor naming changes (#8433) 2024-07-12 10:46:02 +03:00
ggml-quants.h ggml : minor naming changes (#8433) 2024-07-12 10:46:02 +03:00
ggml-rpc.cpp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
ggml-sycl.cpp [SYCL] add concat through dim 1/2 (#8483) 2024-07-15 19:32:15 +08:00
ggml-vulkan.cpp Vulkan MMQ Fix (#8479) 2024-07-15 09:38:52 +02:00
ggml.c Refactor lora adapter support (#8332) 2024-07-15 20:50:47 +02:00