mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-12-26 06:10:29 +01:00
97bdd26eee
* lora: load to devide buft
* add patch tensor function
* correct tensor patch
* llama_lora_adapter_apply
* correct ggml_backend_tensor_copy
* add llm_build_mm
* fix auto merge
* update based on review comments
* add convert script
* no more transpose A
* add f16 convert
* add metadata check
* add sanity check
* fix ftype
* add requirements
* fix requirements
* fix outfile
* conversion: only allow selected models
* fix types
* cuda : do not use dmmv if the tensor does not have enough cols
* llama : lora fixes
* do not disable mmap with lora
Co-authored-by: slaren <slarengh@gmail.com>
* llm_build_lora_mm_id
* convert_lora : MoE LoRA conversion support
* convert_lora : prefer safetensors, similarly to convert_hf
* convert_hf : simplify modify_tensors for InternLM2
* convert_lora : lazy conversion
* llama : load and use alpha from LoRA adapters
* llama : use llm_build_lora_mm in most model graphs
* auto scale
* Revert "auto scale"
This reverts commit
|
||
---|---|---|
.. | ||
requirements-all.txt | ||
requirements-compare-llama-bench.txt | ||
requirements-convert_hf_to_gguf_update.txt | ||
requirements-convert_hf_to_gguf.txt | ||
requirements-convert_legacy_llama.txt | ||
requirements-convert_llama_ggml_to_gguf.txt | ||
requirements-convert_lora_to_gguf.txt | ||
requirements-pydantic.txt | ||
requirements-test-tokenizer-random.txt |