llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 22:59:24 +01:00

Author	SHA1	Message	Date
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00
jaime-m-p	37bef89433	tokenizer : BPE fixes (#7530 ) * Random test: add_bos_token, add_eos_token * Random test: add BPE models for testing * Custom regex split fails with codepoint 0 * Fix falcon punctuation regex * Refactor llm_tokenizer_bpe: move code to constructor * Move 'add_special_bos/eos' logic to llm_tokenizer_bpe * Move tokenizer flags to vocab structure. * Default values for special_add_bos/eos * Build vocab.special_tokens_cache using vocab token types * Generalize 'jina-v2' per token attributes * Fix unicode whitespaces (deepseek-coder, deepseek-llm) * Skip missing byte tokens (falcon) * Better unicode data generation * Replace char32_t with uint32_t	2024-06-18 18:40:52 +02:00
Georgi Gerganov	5326bcceeb	ggml : sync	2024-06-18 09:50:45 +03:00
Olivier Chafik	1c641e6aac	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 ) * `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew * server: update refs -> llama-server gitignore llama-server * server: simplify nix package * main: update refs -> llama fix examples/main ref * main/server: fix targets * update more names * Update build.yml * rm accidentally checked in bins * update straggling refs * Update .gitignore * Update server-llm.sh * main: target name -> llama-cli * Prefix all example bins w/ llama- * fix main refs * rename {main->llama}-cmake-pkg binary * prefix more cmake targets w/ llama- * add/fix gbnf-validator subfolder to cmake * sort cmake example subdirs * rm bin files * fix llama-lookup-* Makefile rules * gitignore /llama-* * rename Dockerfiles * rename llama\|main -> llama-cli; consistent RPM bin prefixes * fix some missing -cli suffixes * rename dockerfile w/ llama-cli * rename(make): llama-baby-llama * update dockerfile refs * more llama-cli(.exe) * fix test-eval-callback * rename: llama-cli-cmake-pkg(.exe) * address gbnf-validator unused fread warning (switched to C++ / ifstream) * add two missing llama- prefixes * Updating docs for eval-callback binary to use new `llama-` prefix. * Updating a few lingering doc references for rename of main to llama-cli * Updating `run-with-preset.py` to use new binary names. Updating docs around `perplexity` binary rename. * Updating documentation references for lookup-merge and export-lora * Updating two small `main` references missed earlier in the finetune docs. * Update apps.nix * update grammar/README.md w/ new llama-* names * update llama-rpc-server bin name + doc * Revert "update llama-rpc-server bin name + doc" This reverts commit `e474ef1df4`. * add hot topic notice to README.md * Update README.md * Update README.md * rename gguf-split & quantize bins refs in **/tests.sh --------- Co-authored-by: HanClinto <hanclinto@gmail.com>	2024-06-13 00:41:52 +01:00
Georgi Gerganov	1442677f92	common : refactor cli arg parsing (#7675 ) * common : gpt_params_parse do not print usage * common : rework usage print (wip) * common : valign * common : rework print_usage * infill : remove cfg support * common : reorder args * server : deduplicate parameters ggml-ci * common : add missing header ggml-ci * common : remote --random-prompt usages ggml-ci * examples : migrate to gpt_params ggml-ci * batched-bench : migrate to gpt_params * retrieval : migrate to gpt_params * common : change defaults for escape and n_ctx * common : remove chatml and instruct params ggml-ci * common : passkey use gpt_params	2024-06-04 21:23:39 +03:00
Georgi Gerganov	554c247caf	ggml : remove OpenCL (#7735 ) ggml-ci	2024-06-04 21:23:20 +03:00
slaren	adc9ff3841	llama-bench : allow using a different printer for stderr with -oe (#7722 ) compare-commits.sh : hide stdout, use -oe to print markdown	2024-06-04 14:32:42 +02:00
Johannes Gäßler	c8047d538f	scripts: update compare_llama_bench.py [no ci] (#7673 )	2024-05-31 16:26:21 +02:00
Galunid	9c4c9cc83f	Move convert.py to examples/convert-legacy-llama.py (#7430 ) * Move convert.py to examples/convert-no-torch.py * Fix CI, scripts, readme files * convert-no-torch -> convert-legacy-llama * Move vocab thing to vocab.py * Fix convert-no-torch -> convert-legacy-llama * Fix lost convert.py in ci/run.sh * Fix imports * Fix gguf not imported correctly * Fix flake8 complaints * Fix check-requirements.sh * Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE * Review fixes	2024-05-30 21:40:00 +10:00
Georgi Gerganov	00281b7be3	scripts : remove mpi remnants	2024-05-29 14:31:18 +03:00
Georgi Gerganov	2ab977282b	sync : ggml	2024-05-29 14:29:52 +03:00
slaren	d359f30921	llama : remove MPI backend (#7395 )	2024-05-20 01:17:03 +02:00
jaime-m-p	b43272afa2	Unicode codepoint flags for custom regexs (#7245 ) * Replace CODEPOINT_TYPE_* with codepoint_flags * Update and bugfix brute force random test * Deterministic brute force random test * Unicode normalization NFD * Get rid of BOM	2024-05-18 01:09:13 +02:00
Brian	51e9d02599	Added a single test function script and fix debug-test.sh to be more robust (#7279 ) * run-single-test.sh: added a single test function script and fix debug-test.sh to be more robust * debug-test.sh: combined execute and gdb test mode via -g flag * debug-test.sh: refactor * debug-test: refactor for clarity * debug-test.sh: comment style changes * debug-test.sh: fix gdb	2024-05-17 22:40:14 +10:00
Georgi Gerganov	29499bb593	sync : ggml	2024-05-15 13:23:41 +03:00
Georgi Gerganov	9f773486ab	script : sync ggml-rpc	2024-05-14 19:14:38 +03:00
Georgi Gerganov	a5e3fde857	sync : ggml ggml-ci	2024-05-14 19:08:09 +03:00
Georgi Gerganov	7bd4ffb780	metal : fix warnings (skipme) (#0 )	2024-05-11 21:38:13 +03:00
Georgi Gerganov	1622ac023f	sync : ggml	2024-05-11 21:35:05 +03:00
Josh Ramer	fed0108491	Scripting & documenting debugging one test without anything else in the loop. (#7096 ) * A little documentation that shares my quick tips for working in the repository. * Update startup-testing-debugging.md * script that shows a menu of tests to pick from & run the debugger on * debug-test.sh: Refactor CLI help message * debug-test.sh: documentation update * debug-test.sh: CLI Help output corrections * debug-test.sh: minor doc fix --------- authored-by: Josh Ramer <ubuntu@ip-172-31-32-53.ec2.internal> Assisted-by: brian khuu <mofosyne@gmail.com>	2024-05-12 03:26:35 +10:00
Georgi Gerganov	fae9d234b6	sync : ggml ggml-ci	2024-05-11 15:38:34 +03:00
slaren	e849648888	llama-bench : add pp+tg test type (#7199 )	2024-05-10 18:03:54 +02:00
jaime-m-p	43248e5594	llama3 custom regex split (#6965 ) * merged the changes from deepseeker models to main branch * Moved regex patterns to unicode.cpp and updated unicode.h * Moved header files * Resolved issues * added and refactored unicode_regex_split and related functions * Updated/merged the deepseek coder pr * Refactored code * Adding unicode regex mappings * Adding unicode regex function * Added needed functionality, testing remains * Fixed issues * Fixed issue with gpt2 regex custom preprocessor * unicode : fix? unicode_wstring_to_utf8 * lint : fix whitespaces * tests : add tokenizer tests for numbers * unicode : remove redundant headers * tests : remove and rename tokenizer test scripts * tests : add sample usage * gguf-py : reader prints warnings on duplicate keys * llama : towards llama3 tokenization support (wip) * unicode : shot in the dark to fix tests on Windows * unicode : first try custom implementations * convert : add "tokenizer.ggml.pre" GGUF KV (wip) * llama : use new pre-tokenizer type * convert : fix pre-tokenizer type writing * lint : fix * make : add test-tokenizer-0-llama-v3 * wip * models : add llama v3 vocab file * llama : adapt punctuation regex + add llama 3 regex * minor * unicode : set bomb * unicode : set bomb * unicode : always use std::wregex * unicode : support \p{N}, \p{L} and \p{P} natively * unicode : try fix windows * unicode : category support via std::regex * unicode : clean-up * unicode : simplify * llama3 custom regex split * convert : add convert-hf-to-gguf-update.py ggml-ci * lint : update * convert : add falcon ggml-ci * unicode : normalize signatures * lint : fix * lint : fix * convert : remove unused functions * convert : add comments * convert : exercise contractions ggml-ci * Using char32_t for codepoints * lint : fix * already exists unicode_tolower() * Typing * Restore BOM * cmake : refactor test targets * tests : refactor vocab tests ggml-ci * tests : add more vocabs and tests ggml-ci * unicode : cleanup * scripts : ignore new update script in check-requirements.sh * Fix merge * models : add phi-3, mpt, gpt-2, starcoder * tests : disable obsolete ggml-ci * tests : use faster bpe test ggml-ci * llama : more prominent warning for old BPE models * tests : disable test-tokenizer-1-bpe due to slowness ggml-ci * Move unused variable value * GPT2 custom regex split * Add alternative regex for custom aplit llama3 Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Style * Add bruteforce random tests for token encoding * wip: fixing unicode codepoint ranges * Fix merge * Unicode tables: separator, lowercase, uppercase and whitespace * llama3 custom regex split: fix \s * Restore BOM * Style * wip: generate NDF table * Ignore special tokens for testing * Clean gen-unicode-data.py * Refactor random tokenizer test * lint : fix * tests : add fail test for llama-bpe --------- Co-authored-by: Jaggzh <jaggz.h@gmail.com> Co-authored-by: Kazim Abrar Mahi <kazimabrarmahi135@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: jaime-m-p <>	2024-05-09 23:30:44 +10:00
Brian	acdce3cdef	compare-llama-bench.py: add missing basicConfig (#7138 ) * compare-llama-bench.py: add missing basicConfig * compare-llama-bench.py: Add line break between error message and print_help() * Add regular print() markdown table	2024-05-08 10:54:39 +02:00
Brian	6fbd432211	py : logging and flake8 suppression refactoring (#7081 ) Set one as executable and add basicConfig() to another. Also added noqa tag to test scripts.	2024-05-05 08:07:48 +03:00
Georgi Gerganov	92139b90af	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 ) * tests : add test-tokenizer-0.sh * unicode : add all unicode number ranges * starcoder : fix pre-tokenizer * tests : add test that fails with DeepSeek tokenizers * falcon : fix regex * unicode : regenerate unicode tables * refact : add tokenizer model * lint : fix * tests : disable failing tests ggml-ci * refact : add tests files ggml-ci * convert : print -> logging ggml-ci * lint : fix * unicode : digit -> number * phi-3 : update	2024-05-04 08:32:32 +03:00
Brian	a2ac89d6ef	convert.py : add python logging instead of print() (#6511 ) * convert.py: add python logging instead of print() * convert.py: verbose flag takes priority over dump flag log suppression * convert.py: named instance logging * convert.py: use explicit logger id string * convert.py: convert extra print() to named logger * convert.py: sys.stderr.write --> logger.error * .py: Convert all python scripts to use logging module requirements.txt: remove extra line * flake8: update flake8 ignore and exclude to match ci settings * gh-actions: add flake8-no-print to flake8 lint step * pre-commit: add flake8-no-print to flake8 and also update pre-commit version * convert-hf-to-gguf.py: print() to logger conversion * .py: logging basiconfig refactor to use conditional expression .py: removed commented out logging fixup! .py: logging basiconfig refactor to use conditional expression constant.py: logger.error then exit should be a raise exception instead * .py: Convert logger error and sys.exit() into a raise exception (for atypical error) gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar * verify-checksum-model.py: This is the result of the program, it should be printed to stdout. * compare-llama-bench.py: add blank line for readability during missing repo response * reader.py: read_gguf_file() use print() over logging * convert.py: warning goes to stderr and won't hurt the dump output * gguf-dump.py: dump_metadata() should print to stdout * convert-hf-to-gguf.py: print --> logger.debug or ValueError() * verify-checksum-models.py: use print() for printing table * .py: refactor logging.basicConfig() gguf-py/gguf/.py: use __name__ as logger name Since they will be imported and not run directly. python-lint.yml: use .flake8 file instead * constants.py: logger no longer required * convert-hf-to-gguf.py: add additional logging * convert-hf-to-gguf.py: print() --> logger * .py: fix flake8 warnings revert changes to convert-hf-to-gguf.py for get_name() * convert-hf-to-gguf-update.py: use triple quoted f-string instead * .py: accidentally corrected the wrong line *.py: add compilade warning suggestions and style fixes	2024-05-03 22:36:41 +03:00
Georgi Gerganov	f4ab2a4147	llama : fix BPE pre-tokenization (#6920 ) * merged the changes from deepseeker models to main branch * Moved regex patterns to unicode.cpp and updated unicode.h * Moved header files * Resolved issues * added and refactored unicode_regex_split and related functions * Updated/merged the deepseek coder pr * Refactored code * Adding unicode regex mappings * Adding unicode regex function * Added needed functionality, testing remains * Fixed issues * Fixed issue with gpt2 regex custom preprocessor * unicode : fix? unicode_wstring_to_utf8 * lint : fix whitespaces * tests : add tokenizer tests for numbers * unicode : remove redundant headers * tests : remove and rename tokenizer test scripts * tests : add sample usage * gguf-py : reader prints warnings on duplicate keys * llama : towards llama3 tokenization support (wip) * unicode : shot in the dark to fix tests on Windows * unicode : first try custom implementations * convert : add "tokenizer.ggml.pre" GGUF KV (wip) * llama : use new pre-tokenizer type * convert : fix pre-tokenizer type writing * lint : fix * make : add test-tokenizer-0-llama-v3 * wip * models : add llama v3 vocab file * llama : adapt punctuation regex + add llama 3 regex * minor * unicode : set bomb * unicode : set bomb * unicode : always use std::wregex * unicode : support \p{N}, \p{L} and \p{P} natively * unicode : try fix windows * unicode : category support via std::regex * unicode : clean-up * unicode : simplify * convert : add convert-hf-to-gguf-update.py ggml-ci * lint : update * convert : add falcon ggml-ci * unicode : normalize signatures * lint : fix * lint : fix * convert : remove unused functions * convert : add comments * convert : exercise contractions ggml-ci * lint : fix * cmake : refactor test targets * tests : refactor vocab tests ggml-ci * tests : add more vocabs and tests ggml-ci * unicode : cleanup * scripts : ignore new update script in check-requirements.sh * models : add phi-3, mpt, gpt-2, starcoder * tests : disable obsolete ggml-ci * tests : use faster bpe test ggml-ci * llama : more prominent warning for old BPE models * tests : disable test-tokenizer-1-bpe due to slowness ggml-ci --------- Co-authored-by: Jaggzh <jaggz.h@gmail.com> Co-authored-by: Kazim Abrar Mahi <kazimabrarmahi135@gmail.com>	2024-04-29 16:58:41 +03:00
Olivier Chafik	5cf5e7d490	`build`: generate hex dump of server assets during build (#6661 ) * `build`: generate hex dumps of server assets on the fly * build: workaround lack of -n on gnu xxd * build: don't use xxd in cmake * build: don't call xxd from build.zig * build: more idiomatic hexing * build: don't use xxd in Makefile (od hackery instead) * build: avoid exceeding max cmd line limit in makefile hex dump * build: hex dump assets at cmake build time (not config time)	2024-04-21 18:48:53 +01:00
slaren	0d56246f4b	ggml : group all experts in a single ggml_mul_mat_id (#6505 ) * ggml : group all experts in a single ggml_mul_mat_id cuda : improve mmid row copy * cuda : fix bin bcast with non-cont src0 * test-backend-ops : only run all mul mat tests for base types * llama : disable moe offloading with SYCL --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-18 15:18:48 +02:00
Pierrick Hymbert	4bd0f93e4a	model: support arch `DbrxForCausalLM` (#6515 ) * model: dbrx convert to gguf #6344 * llama: support dbrx #6344 * doc: dbrx: add the model as supported * scripts: get-wikitext-2 add unzip * llama: increase maximum experts allowed * llama: factorize moe graph implementation between grok, mixtral and dbrx --------- Co-authored-by: Megha Agarwal <16129366+megha95@users.noreply.github.com>	2024-04-13 11:33:52 +02:00
Daniel Bevenius	f4183afe6a	scripts : add --outdir option to hf.sh (#6600 ) * scripts : add --outdir option to hf.sh This commit adds an option to the hf.sh script that allows the user to specify an output directory for the downloaded file. The motivation for this changes is that examples that use the hf.sh script to download models from huggingface can now specify the output directory, perhaps to the `models` directory to keep them in one place and not clutter the root directory. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! scripts : add --outdir option to hf.sh Fix format of the --outdir option in the usage message. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-04-11 16:22:47 +03:00
Georgi Gerganov	c4a3a4ff47	sync : ggml	2024-04-09 20:29:06 +03:00
Georgi Gerganov	e11a8999b5	license : update copyright notice + add AUTHORS (#6405 ) * license : add AUTHORS * authors : update * scipts : add LICENSE and gen-authors.sh to sync	2024-04-09 09:23:19 +03:00
Georgi Gerganov	c37247796b	sync : ggml	2024-04-07 17:05:51 +03:00
Georgi Gerganov	43e8995e75	scripts : sync ggml-cuda folder	2024-04-07 16:08:12 +03:00
Georgi Gerganov	54ea0698fb	sync : ggml	2024-04-06 18:27:46 +03:00
Johannes Gäßler	33a5244806	compare-llama-bench.py: fix long hexsha args (#6424 )	2024-04-01 13:30:43 +02:00
Georgi Gerganov	d48ccf3ad4	sync : ggml (#6351 ) * sync : ggml ggml-ci * cuda : move GGML_CUDA_DMMV constants to dmmv.cuh --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-03-29 17:45:46 +02:00
slaren	280345968d	cuda : rename build flag to LLAMA_CUDA (#6299 )	2024-03-26 01:16:01 +01:00
Johannes Gäßler	50ccaf5eac	lookup: complement data from context with general text statistics (#5479 ) * lookup: evaluation tools, use corpus/previous gens * fixup! lookup: evaluation tools, use corpus/previous gens * fixup! lookup: evaluation tools, use corpus/previous gens * fixup! lookup: evaluation tools, use corpus/previous gens * fixup! lookup: evaluation tools, use corpus/previous gens	2024-03-23 01:24:36 +01:00
Georgi Gerganov	b838b53ad6	sync : ggml	2024-03-10 20:10:46 +02:00
Georgi Gerganov	8a3012a4ad	ggml : add ggml-common.h to deduplicate shared code (#5940 ) * ggml : add ggml-common.h to shared code ggml-ci * scripts : update sync scripts * sycl : reuse quantum tables ggml-ci * ggml : minor * ggml : minor * sycl : try to fix build	2024-03-09 12:47:57 +02:00
slaren	652ca2bded	compare-llama-bench.py : remove mul_mat_q (#5892 )	2024-03-05 22:27:29 +01:00
Georgi Gerganov	efd8533ef8	sync : ggml ggml-ci	2024-03-04 20:54:23 +02:00
Georgi Gerganov	a0fc62661f	sync : ggml	2024-03-04 10:40:04 +02:00
Georgi Gerganov	ef2cd694c4	scripts : add pod-llama.sh	2024-03-02 16:54:20 +02:00
Pierrick Hymbert	3ab8b3a92e	llama : cleanup unused mmq flags (#5772 ) * cleanup unused --no-mul-mat-q,-nommq, -mmq, --mul-mat-q, mul_mat_q * remove: mul_mat_q in compare llama bench and usage * update llama-bench --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-03-01 13:39:06 +02:00
Georgi Gerganov	8c0e8f4e73	sync : ggml	2024-02-28 11:17:32 +02:00
Georgi Gerganov	334f76fa38	sync : ggml	2024-02-22 23:21:05 +02:00
Georgi Gerganov	5022cf242d	sync : ggml	2024-02-21 16:52:52 +02:00
Georgi Gerganov	eccd7a26dd	sync : ggml (#5633 ) * ggml : fix conv_2d batch mode (ggml/737) Co-authored-by: bssrdf <bssrdf@gmail.com> * ggml : compute forward no longer pass src tensors (ggml/729) * sync : ggml ggml-ci --------- Co-authored-by: bssrdf <merlintiger@hotmail.com> Co-authored-by: bssrdf <bssrdf@gmail.com>	2024-02-21 16:17:10 +02:00
Georgi Gerganov	337c9cbd52	sync : ggml ggml-ci	2024-02-19 15:09:43 +02:00
Jared Van Bortel	a0c2dad9d4	build : pass all warning flags to nvcc via -Xcompiler (#5570 ) * build : pass all warning flags to nvcc via -Xcompiler * make : fix apparent mis-merge from #3952 * make : fix incorrect GF_CC_VER for CUDA host compiler	2024-02-18 16:21:52 -05:00
Georgi Gerganov	b1de96824b	ci : fix wikitext url + compile warnings (#5569 ) ggml-ci	2024-02-18 22:39:30 +02:00
Georgi Gerganov	d2819d5577	scripts : add helpers script for bench comparing commits (#5521 ) * scripts : add helpers script for bench comparing commits * scripts : detect CUDA * set flags after checking the command line * fix make flags --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-02-16 15:14:40 +02:00
Georgi Gerganov	9350a1cf21	scripts : add hf.sh helper script (#5501 ) * scripts : add hf.sh helper scripts * hf : add error logs * hf : add support for --repo and --file	2024-02-15 15:41:15 +02:00
Georgi Gerganov	3b169441df	sync : ggml (#5452 ) * ggml-alloc : v3 (ggml/727) * ggml-alloc v3 ggml-ci * fix ci ggml-ci * whisper : check for backend buffer allocation failures * whisper : avoid leaks when initialization fails * cleanup ggml-ci * style fixes ggml-ci * sync : ggml * update llama.cpp, clip.cpp, export-lora.cpp * update finetune.cpp, train-text-from-scratch.cpp ggml-ci * ggml-backend : reduce alignment to 32 to match gguf and fix mmap --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-02-12 09:16:06 +02:00
Georgi Gerganov	cd9aea63b5	scripts : update sync scripts with new backends	2024-02-10 09:53:05 +02:00
Georgi Gerganov	43b65f5eb8	sync : ggml	2024-02-10 09:30:36 +02:00
Georgi Gerganov	30679d438d	scripts : fix typos, cleanup (#5303 )	2024-02-05 09:48:03 +02:00
Нияз Гарифзянов	4be04c8965	scripts : add non-interactive server-llm.sh (#5303 ) * Update server-llm.sh Add flag --non-interactive that allows run script without asking a permission * Update scripts/server-llm.sh --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-05 09:43:57 +02:00
Georgi Gerganov	e437b37fd0	scripts : parse wtype in server-llm.sh (#5167 ) * scripts : parse wtype in server-llm.sh * scripts : fix check for wfile	2024-02-02 14:23:40 +02:00
Neo Zhang Jianyu	01684139c3	support SYCL backend windows build (#5208 ) * support SYCL backend windows build * add windows build in CI * add for win build CI * correct install oneMKL * fix install issue * fix ci * fix install cmd * fix install cmd * fix install cmd * fix install cmd * fix install cmd * fix win build * fix win build * fix win build * restore other CI part * restore as base * rm no new line * fix no new line issue, add -j * fix grammer issue * allow to trigger manually, fix format issue * fix format * add newline * fix format * fix format * fix format issuse --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-01-31 08:08:07 +05:30
Georgi Gerganov	8f8ddfcfad	sync : ggml (#0 )	2024-01-30 16:21:57 +02:00
Georgi Gerganov	35dec26cc2	sync : ggml	2024-01-28 19:48:05 +02:00
Georgi Gerganov	753eafed0e	sync : ggml	2024-01-27 17:00:24 +02:00
Georgi Gerganov	5f1925a8ce	scripts : move run-with-preset.py from root to scripts folder	2024-01-26 17:09:44 +02:00
crasm	413e7b0559	ci : add model tests + script wrapper (#4586 ) * scripts : add lib.sh and lib_test.sh * scripts : stub out new ci-run.sh script * scripts : switch to PascalCase for functions This looks a little odd at first, but I find it very useful as a convention to know if a command is part of our code vs a builtin. * scripts : add some fancy conversion from snake_case to PascalCase * Add venv to ci/run.sh * Revert scripts work * scripts : add wrapper script for local use of ci/run.sh * Simplify .gitignore for tests, clang-tidy fixes * Label all ctest tests * ci : ctest uses -L main * Attempt at writing ctest_with_model * Update test-model-load-cancel * ci : add ctest_with_model for debug and release ggml-ci * Fix gg_get_model function ggml-ci * got stuck on CMake * Add get_model.cpp to tests/CMakeLists.txt ggml-ci * Fix README.md output for ctest_with_model ggml-ci * workflows : use `-L main` for all ctest ggml-ci * Fixes * GG_RUN_CTEST_MODELFILE => LLAMACPP_TESTMODELFILE * Always show warning rather than failing if model file variable is not set * scripts : update usage text for ci-run.sh	2024-01-26 14:18:00 +02:00
Georgi Gerganov	e9240cdfa0	scripts : add get-winogrande.sh	2024-01-18 20:45:39 +02:00
Georgi Gerganov	dcad445d0c	scritps : add helper script to get hellaswag data in txt format	2024-01-18 11:44:49 +02:00
Georgi Gerganov	6b6916b215	sync : ggml	2024-01-17 20:54:50 +02:00
Georgi Gerganov	9408cfdad6	scripts : sync-ggml-am.sh option to skip commits	2024-01-14 11:08:41 +02:00
Georgi Gerganov	76484fbfd3	sync : ggml	2024-01-14 00:14:46 +02:00
Johannes Gäßler	7dc78764e2	compare-llama-bench: tweak output format (#4910 )	2024-01-13 15:52:53 +01:00
Georgi Gerganov	de473f5f8e	sync : ggml	2024-01-12 22:02:43 +02:00
Georgi Gerganov	64802ec00d	sync : ggml	2024-01-11 09:39:08 +02:00
Johannes Gäßler	4f56458d34	Python script to compare commits with llama-bench (#4844 )	2024-01-10 01:04:33 +01:00
Georgi Gerganov	9a818f7c42	scripts : improve get-pg.sh (#4838 )	2024-01-09 19:21:13 +02:00
Georgi Gerganov	d9653894df	scripts : script to get Paul Graham essays in txt format (#4838 )	2024-01-09 16:23:05 +02:00
Georgi Gerganov	91d38876df	metal : switch back to default.metallib (ggml/681) ggml-ci	2024-01-05 18:02:06 +02:00
Georgi Gerganov	7bed7eba35	cuda : simplify expression Co-authored-by: slaren <slarengh@gmail.com>	2024-01-03 14:38:38 +02:00
Georgi Gerganov	75e3fd8581	sync : ggml ggml-ci	2024-01-03 14:38:38 +02:00
Georgi Gerganov	ab62fc3e55	scripts : fix sync order + metal sed	2024-01-03 14:38:38 +02:00
crasm	04ac0607e9	python : add check-requirements.sh and GitHub workflow (#4585 ) * python: add check-requirements.sh and GitHub workflow This script and workflow forces package versions to remain compatible across all convert.py scripts, while allowing secondary convert scripts to import dependencies not wanted in convert.py. Move requirements into ./requirements * Fail on "==" being used for package requirements (but can be suppressed) * Enforce "compatible release" syntax instead of == * Update workflow * Add upper version bound for transformers and protobuf * improve check-requirements.sh * small syntax change * don't remove venvs if nocleanup is passed * See if this fixes docker workflow * Move check-requirements.sh into ./scripts/ --------- Co-authored-by: Jared Van Bortel <jared@nomic.ai>	2023-12-29 16:50:29 +02:00
Georgi Gerganov	c8255f8a6b	scripts : print list of sync commits	2023-12-29 15:12:35 +02:00
Georgi Gerganov	38b3de4658	sync : ggml	2023-12-29 14:56:41 +02:00
Georgi Gerganov	ca38b8d334	scripts : do not sync commits from this repo	2023-12-29 14:54:05 +02:00
Georgi Gerganov	b47879b0dd	scripts : add sync-ggml-am.sh	2023-12-27 11:44:22 +02:00
Jared Van Bortel	70f806b821	build : detect host compiler and cuda compiler separately (#4414 )	2023-12-13 12:10:10 -05:00
Georgi Gerganov	fe680e3d10	sync : ggml (new ops, tests, backend, etc.) (#4359 ) * sync : ggml (part 1) * sync : ggml (part 2, CUDA) * sync : ggml (part 3, Metal) * ggml : build fixes ggml-ci * cuda : restore lost changes * cuda : restore lost changes (StableLM rope) * cmake : enable separable compilation for CUDA ggml-ci * ggml-cuda : remove device side dequantize * Revert "cmake : enable separable compilation for CUDA" This reverts commit `09e35d04b1`. * cuda : remove assert for rope * tests : add test-backend-ops * ggml : fix bug in ggml_concat * ggml : restore `ggml_get_n_tasks()` logic in `ggml_graph_plan()` * ci : try to fix macOS * ggml-backend : remove backend self-registration * ci : disable Metal for macOS cmake build ggml-ci * metal : fix "supports family" call * metal : fix assert * metal : print resource path ggml-ci --------- Co-authored-by: slaren <slarengh@gmail.com>	2023-12-07 22:26:54 +02:00
bandoti	b38a16dfcf	cmake : fix issue with version info not getting baked into LlamaConfig.cmake (#3970 ) * Split CPP generation from build-info query * Remove blank lines * Add BUILD_SHARED_LIBS option	2023-11-27 21:25:42 +02:00
Georgi Gerganov	4760e7cc0b	sync : ggml (backend v2) (#3912 ) * sync : ggml (backend v2) (wip) * sync : migrate examples and llama.cpp to dynamic graphs (wip) * sync : update tests + fix max op params to 64 ggml-ci * sync : ggml-cuda ggml-ci * llama : fix save/load state context size ggml-ci * sync : try to fix build on tvOS * sync : pass custom graph sizes in training examples * sync : update graph copies to new ggml API * sync : update sync-ggml.sh with new files * scripts : fix header in sync script * train : fix context size calculations * llama : increase inference graph size up to 4096 nodes * train : allocate grads for backward graphs * train : allocate grads for gb_tmp	2023-11-13 14:16:23 +02:00
cebtenzzre	b12fa0d1c1	build : link against build info instead of compiling against it (#3879 ) * cmake : fix build when .git does not exist * cmake : simplify BUILD_INFO target * cmake : add missing dependencies on BUILD_INFO * build : link against build info instead of compiling against it * zig : make build info a .cpp source instead of a header Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com> * cmake : revert change to CMP0115 --------- Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>	2023-11-02 08:50:16 +02:00
Georgi Gerganov	f0e209324a	scripts : add server-llm.sh (#3868 ) * scripts : add deploy-server.sh * scripts : rename to server-llm.sh * scripts : working curl pipe	2023-11-01 11:29:07 +02:00
Georgi Gerganov	db3abcc114	sync : ggml (ggml-backend) (#3548 ) * sync : ggml (ggml-backend) ggml-ci * zig : add ggml-backend to the build	2023-10-08 20:19:14 +03:00
bandoti	095231dfd3	cmake : fix transient definitions in find pkg (#3411 )	2023-10-02 12:51:49 +03:00
DAN™	99115f3fa6	cmake : fix build-info.h on MSVC (#3309 )	2023-09-25 18:45:33 -04:00
Kevin Ji	bedb92b603	scripts : use `/usr/bin/env` in shebang (#3313 )	2023-09-22 23:52:23 -04:00
Cebtenzzre	e6616cf0db	examples : add compiler version and target to build info (#2998 )	2023-09-15 16:59:49 -04:00

1 2 3 4

172 Commits