Francis Couture-Harpin
10c3c419e9
Merge branch 'master' into compilade/refactor-kv-cache
2024-06-30 16:04:57 -04:00
Douglas Hanley
80ea089d77
llama : allow pooled embeddings on any model ( #7477 )
...
* create append_pooling operation; allow to specify attention_type; add last token pooling; update examples
* find result_norm/result_embd tensors properly; update output allocation logic
* only use embd output for pooling_type NONE
* get rid of old causal_attn accessor
* take out attention_type; add in llama_set_embeddings
* bypass logits when doing non-NONE pooling
2024-06-21 08:38:22 +03:00
Olivier Chafik
1c641e6aac
build
: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )
...
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew
* server: update refs -> llama-server
gitignore llama-server
* server: simplify nix package
* main: update refs -> llama
fix examples/main ref
* main/server: fix targets
* update more names
* Update build.yml
* rm accidentally checked in bins
* update straggling refs
* Update .gitignore
* Update server-llm.sh
* main: target name -> llama-cli
* Prefix all example bins w/ llama-
* fix main refs
* rename {main->llama}-cmake-pkg binary
* prefix more cmake targets w/ llama-
* add/fix gbnf-validator subfolder to cmake
* sort cmake example subdirs
* rm bin files
* fix llama-lookup-* Makefile rules
* gitignore /llama-*
* rename Dockerfiles
* rename llama|main -> llama-cli; consistent RPM bin prefixes
* fix some missing -cli suffixes
* rename dockerfile w/ llama-cli
* rename(make): llama-baby-llama
* update dockerfile refs
* more llama-cli(.exe)
* fix test-eval-callback
* rename: llama-cli-cmake-pkg(.exe)
* address gbnf-validator unused fread warning (switched to C++ / ifstream)
* add two missing llama- prefixes
* Updating docs for eval-callback binary to use new `llama-` prefix.
* Updating a few lingering doc references for rename of main to llama-cli
* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.
* Updating documentation references for lookup-merge and export-lora
* Updating two small `main` references missed earlier in the finetune docs.
* Update apps.nix
* update grammar/README.md w/ new llama-* names
* update llama-rpc-server bin name + doc
* Revert "update llama-rpc-server bin name + doc"
This reverts commit e474ef1df4
.
* add hot topic notice to README.md
* Update README.md
* Update README.md
* rename gguf-split & quantize bins refs in **/tests.sh
---------
Co-authored-by: HanClinto <hanclinto@gmail.com>
2024-06-13 00:41:52 +01:00
Francis Couture-Harpin
43d8d4bf9e
examples : replace llama_kv_cache_seq_* with llama_past_seq_*
2024-06-11 23:27:04 -04:00
Georgi Gerganov
1442677f92
common : refactor cli arg parsing ( #7675 )
...
* common : gpt_params_parse do not print usage
* common : rework usage print (wip)
* common : valign
* common : rework print_usage
* infill : remove cfg support
* common : reorder args
* server : deduplicate parameters
ggml-ci
* common : add missing header
ggml-ci
* common : remote --random-prompt usages
ggml-ci
* examples : migrate to gpt_params
ggml-ci
* batched-bench : migrate to gpt_params
* retrieval : migrate to gpt_params
* common : change defaults for escape and n_ctx
* common : remove chatml and instruct params
ggml-ci
* common : passkey use gpt_params
2024-06-04 21:23:39 +03:00
Georgi Gerganov
6ff13987ad
common : normalize naming style ( #7462 )
...
* common : normalize naming style
ggml-ci
* common : match declaration / definition order
* zig : try to fix build
2024-05-22 20:04:20 +03:00
Minsoo Cheong
64e7b47c69
examples : add "retrieval" ( #6193 )
...
* add `retrieval` example
* add README
* minor fixes
* cast filepos on print
* remove use of variable sized array
* store similarities in separate vector
* print error on insufficient batch size
* fix error message printing
* assign n_batch value to n_ubatch
* fix param definitions
* define retrieval-only parameters in retrieval.cpp
* fix `--context-file` option to be provided multiple times for multiple files
* use vector for `query_emb`
* add usage description in README
* fix merge conflict
* fix usage printing
* remove seed setting
* fix lint
* increase file read buffer size
* retrieval : minor
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-25 09:38:22 +02:00