llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 06:39:25 +01:00

Author	SHA1	Message	Date
Phillip Kravtsov	0e797c2fc5	llm : support Adept Persimmon 8B (#3410 ) * Produces garbage output * wip: correct tensors up to RoPE * correct tensors thru RoPE * Correct outputs through masked & softmax'd KQ * fp32 works * Rename adept->persimmon * Produces correct outputs * clean up convert scripts * remove printing logic from ggml.c * remove prints from llama.cpp & fix merge * trivial cleanups * Add offload funcs * update conversion script to directly take adept artifacts rather than .saftensors file * Fix norm eps bug * Support sqr and concat on metal, persimmon-8b-q4 runs correctly * Small changes from review * Formatting changes * Minor changes to conversion script * Remove old script * Fix editorconfig formatting * Fix build * add overlooked offload code ggml-ci	2023-10-07 10:12:43 +03:00
ds5t5	f8c90cdbaa	llm : add Refact model (#3329 ) * add refact model * resolve comments * rebase to the latest * solve alibi cpu error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-04 16:23:39 +03:00
cebtenzzre	29a404a951	gguf : add BERT, MPT, and GPT-J arch info (#3408 )	2023-10-02 15:20:28 -04:00
cebtenzzre	0fe321031a	gguf : general usability improvements (#3409 )	2023-10-02 14:58:46 -04:00
Cebtenzzre	20c7e1e804	gguf : fix a few general keys (#3341 )	2023-09-27 12:18:07 -04:00
Meng Zhang	4fe09dfe66	llama : add support for StarCoder model architectures (#3187 ) * add placeholder of starcoder in gguf / llama.cpp * support convert starcoder weights to gguf * convert MQA to MHA * fix ffn_down name * add LLM_ARCH_STARCODER to llama.cpp * set head_count_kv = 1 * load starcoder weight * add max_position_embeddings * set n_positions to max_positioin_embeddings * properly load all starcoder params * fix head count kv * fix comments * fix vram calculation for starcoder * store mqa directly * add input embeddings handling * add TBD * working in cpu, metal buggy * cleanup useless code * metal : fix out-of-bounds access in soft_max kernels * llama : make starcoder graph build more consistent with others * refactor: cleanup comments a bit * add other starcoder models: 3B, 7B, 15B * support-mqa-directly * fix: remove max_position_embeddings, use n_train_ctx * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix: switch to space from tab --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-15 22:02:13 +03:00
Kerfuffle	e394084166	gguf-py : support identity operation in TensorNameMap (#3095 ) Make try_suffixes keyword param optional.	2023-09-14 19:32:26 +03:00
jameswu2014	4c8643dd6e	feature : support Baichuan serial models (#3009 )	2023-09-14 12:32:10 -04:00
Kerfuffle	6519e9c99c	gguf(python): Fix special vocab handling when id < 0 (#2984 )	2023-09-03 04:38:43 -06:00
Cebtenzzre	92d0b751a7	convert : fix python 3.8 support, modernize type annotations (#2916 ) * convert : fix python 3.8 support * convert : sort imports * convert : fix required parameters in convert-llama-ggmlv3-to-gguf * convert : fix mypy errors in convert-llama-ggmlv3-to-gguf * convert : use PEP 585 generics and PEP 604 unions Now that we have `from __future__ import annotations`, we can use this modern syntax in Python 3.7 instead of restricting support to Python 3.9 or 3.10 respectively. * gguf.py : a tuple is already a tuple * add mypy.ini * convert : add necessary `type: ignore` comments * gguf-py: bump version	2023-08-31 08:02:23 +03:00
Kerfuffle	dc07dc492e	convert : various script cleanups/fixes + merges and special token handling (#2842 ) * convert: Fix permute calls and method/func definitions * Cleanups for gguf-py * Minor types cleanups. * Initial implementation of handling merges and special tokens * convert: Handle special tokens and merges in vocab only mode convert: Vocab only mode no longer requires loading model tensors * gguf: Refactor tensor name mapping * convert: Fix type hint for special_token_types in SpecialVocab * Use common special vocab handling in various conversion scripts * First pass at implementing suggested changes * Second pass * gguf: SpecialVocab: Fix issue with special token content not in a dict gguf: SpecialVocab: Allow skipping handling of merges * convert-falcon-hf-to-gguf: Support --vocab-only option, bail out if no tokenizer.json * convert-gptneox-hf-to-gguf and convert: Only handle merges for BPE tokenizer * gguf: SpecialVocab: Actually set load_merges in object * Uniform args parsing and vocab only mode for convert examples * convert.py: Set gpt2 as tokenizer model when using BPE * Squish last type warning in gguf.py - yay!	2023-08-30 11:25:50 +03:00
Georgi Gerganov	d0cee0d36d	gguf : add 64-bit support (GGUF v2) (#2821 ) * gguf : bump version to 2 * gguf : add support for 64-bit (no backwards comp yet) * gguf : v1 backwards comp * gguf.py : bump GGUF version * gguf.py : uint64_t on all lengths, sizes and counts, enums still uint32_t * gguf.py : string lengths uint32_t * gguf : update all counts to 64-bit * gguf.py : string len uint64_t and n_dims uint32_t * gguf : fix typo * llama.cpp : print gguf version --------- Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>	2023-08-27 14:19:54 +03:00
M. Yusuf Sarıgöz	8194cd8772	gguf : export objects to user code (#2780 ) * gguf export more objects to user code * gguf export all objects to user code for now * gguf : bump version	2023-08-25 12:43:41 +03:00
M. Yusuf Sarıgöz	87e3733f24	gguf : make gguf pip-installable * gitignore : add dist and rm pyproject.toml * gguf: prepare as Pip package * gguf: prepare as Pip package * gguf : fix line endings * requirements : add gguf * gguf : update readme with build notes * gguf : update readme with build notes * gguf : add notes for tests	2023-08-25 09:26:05 +03:00

1 2

64 Commits