llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-06 02:48:57 +01:00

History

Xuan Son Nguyen 97bdd26eee Refactor lora adapter support (#8332 ) * lora: load to devide buft * add patch tensor function * correct tensor patch * llama_lora_adapter_apply * correct ggml_backend_tensor_copy * add llm_build_mm * fix auto merge * update based on review comments * add convert script * no more transpose A * add f16 convert * add metadata check * add sanity check * fix ftype * add requirements * fix requirements * fix outfile * conversion: only allow selected models * fix types * cuda : do not use dmmv if the tensor does not have enough cols * llama : lora fixes * do not disable mmap with lora Co-authored-by: slaren <slarengh@gmail.com> * llm_build_lora_mm_id * convert_lora : MoE LoRA conversion support * convert_lora : prefer safetensors, similarly to convert_hf * convert_hf : simplify modify_tensors for InternLM2 * convert_lora : lazy conversion * llama : load and use alpha from LoRA adapters * llama : use llm_build_lora_mm in most model graphs * auto scale * Revert "auto scale" This reverts commit `42415a4874`. * remove redundant params * Apply suggestions from code review Co-authored-by: slaren <slarengh@gmail.com> * change kv metadata * move add_type to __init__ * convert_hf : move add_type to main() * convert_lora : use the GGUFWriter from Model instead of overwriting it --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Francis Couture-Harpin <git@compilade.net>		2024-07-15 20:50:47 +02:00
..
ggml-cuda	cuda : suppress 'noreturn' warn in no_device_code (#8414 )	2024-07-11 17:53:42 +02:00
ggml-sycl	[SYCL] add concat through dim 1/2 (#8483 )	2024-07-15 19:32:15 +08:00
kompute@4565194ed7	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
kompute-shaders	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
llamafile	ggml : move sgemm sources to llamafile subfolder (#8394 )	2024-07-10 15:23:29 +03:00
vulkan-shaders	Vulkan MMQ Fix (#8479 )	2024-07-15 09:38:52 +02:00
CMakeLists.txt	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
ggml-aarch64.c	ggml : suppress unknown pragma 'GCC' on windows (#8460 )	2024-07-15 15:48:17 +03:00
ggml-aarch64.h	ggml : minor naming changes (#8433 )	2024-07-12 10:46:02 +03:00
ggml-alloc.c	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
ggml-backend-impl.h	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
ggml-backend.c	[SYCL] fix the mul_mat_id ut issues (#8427 )	2024-07-12 08:52:04 +08:00
ggml-blas.cpp	ggml : add NVPL BLAS support (#8329 ) (#8425 )	2024-07-11 18:49:15 +02:00
ggml-common.h	ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780 )	2024-07-10 15:14:51 +03:00
ggml-cuda.cu	Refactor lora adapter support (#8332 )	2024-07-15 20:50:47 +02:00
ggml-impl.h	ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780 )	2024-07-10 15:14:51 +03:00
ggml-kompute.cpp	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
ggml-metal.m	metal : template-ify some of the kernels (#8447 )	2024-07-13 18:32:33 +03:00
ggml-metal.metal	metal : template-ify some of the kernels (#8447 )	2024-07-13 18:32:33 +03:00
ggml-quants.c	ggml : minor naming changes (#8433 )	2024-07-12 10:46:02 +03:00
ggml-quants.h	ggml : minor naming changes (#8433 )	2024-07-12 10:46:02 +03:00
ggml-rpc.cpp	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
ggml-sycl.cpp	[SYCL] add concat through dim 1/2 (#8483 )	2024-07-15 19:32:15 +08:00
ggml-vulkan.cpp	Vulkan MMQ Fix (#8479 )	2024-07-15 09:38:52 +02:00
ggml.c	Refactor lora adapter support (#8332 )	2024-07-15 20:50:47 +02:00