Meng, Hengyu
73ef3f769c
Update llama-server-intel.Dockerfile
2024-09-15 23:21:46 +08:00
Meng, Hengyu
3956cf92a9
Update llama-cli-intel.Dockerfile
2024-09-15 23:21:21 +08:00
Meng, Hengyu
af95b1424f
[SYCL] fix cmake broken
2024-09-15 22:57:56 +08:00
Csaba Kecskemeti
3c7989fd29
py : add "LLaMAForCausalLM" conversion support ( #9485 )
...
Co-authored-by: Csaba Kecskemeti <csabakecskemeti@Csabas-Mac-Pro.local>
b3758
2024-09-15 10:48:25 +03:00
OSecret
d6b37c881f
readme : update tools list ( #9475 )
...
* Added link to proprietary wrapper for Unity3d into README.md
Wrapper has prebuild library and was tested on iOS, Android, WebGL, PC, Mac platforms, has online demos like [this](https://d23myu0xfn2ttc.cloudfront.net/rich/index.html ) and [that](https://d23myu0xfn2ttc.cloudfront.net/ ).
* Update README.md
Fixes upon review
b3757
2024-09-15 10:36:53 +03:00
Michael Podvitskiy
7596487beb
cmake : try to fix sycl+intel build ( #9487 )
b3756
2024-09-15 10:06:38 +03:00
Yuri Khrustalev
822b6322de
ggml : ggml_type_name return "NONE" for invalid values ( #9458 )
...
When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.
b3755
2024-09-14 12:54:37 +03:00
VoidIsVoid
dcdcee3a74
server: add data: [DONE] to /chat/completions stream response ( #9459 )
b3754
2024-09-14 11:36:44 +02:00
Georgi Gerganov
1f4111e540
cmake : use list(APPEND ...) instead of set() + dedup linker ( #9463 )
...
* cmake : use list(APPEND ...) instead of set() + dedup linker
ggml-ci
* cmake : try fix sycl
* cmake : try to fix sycl 2
* cmake : fix sycl build (#9469 )
* try fix sycl build
* use CMAKE_CXX_FLAGS as a string variable
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* one more CMAKE_CXX_FLAGS fix (#9471 )
---------
Co-authored-by: Michael Podvitskiy <podvitskiymichael@gmail.com>
b3753
2024-09-14 10:55:05 +03:00
Daniel Bevenius
befaf1197f
llama : make cell_id const in inp_s_mask block ( #9470 )
...
This commit makes the cell_id variable const in the inp_s_mask block.
The motivation for this change is consistency with the code in the
inp_s_copy block.
b3752
2024-09-14 10:50:12 +03:00
Xuan Son Nguyen
feff4aa846
server : add loading html page while model is loading ( #9468 )
...
* Adding loading page for '/' server requests
* set content when model is loading
* removed loading html file
* updated cmakelist
* updated makefile
* cleaned up whitespace
* cleanup for PR removed error
* updated server test to handle 503 HTML
* updated server test to handle 503 HTML
* ca†ch 503 before parsing json
* revert test
* account for both api and web browser requests
* precommit corrections
* eol fix
* revert changes to pre-commit
* removed print statement
* made loading message more descriptive
* also support .html files
---------
Co-authored-by: VJHack <flymyplane21@gmail.com>
Co-authored-by: Vinesh Janarthanan <36610342+VJHack@users.noreply.github.com>
b3751
2024-09-13 14:23:11 +02:00
Georgi Gerganov
0abc6a2c25
llama : llama_perf + option to disable timings during decode ( #9355 )
...
* llama : llama_perf + option to disable timings during decode
ggml-ci
* common : add llama_arg
* Update src/llama.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* perf : separate functions in the API
ggml-ci
* perf : safer pointer handling + naming update
ggml-ci
* minor : better local var name
* perf : abort on invalid sampler pointer
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
b3750
2024-09-13 09:53:38 +03:00
Gilad S.
bd35cb0ae3
feat: remove a sampler from a chain ( #9445 )
...
* feat: remove a sampler from a chain
* fix: return removed sampler
* fix: safer casting
b3749
2024-09-13 03:54:49 +02:00
Mathijs Henquet
78203641fe
server : Add option to return token pieces in /tokenize endpoint ( #9108 )
...
* server : added with_pieces functionality to /tokenize endpoint
* server : Add tokenize with pieces tests to server.feature
* Handle case if tokenizer splits along utf8 continuation bytes
* Add example of token splitting
* Remove trailing ws
* Fix trailing ws
* Maybe fix ci
* maybe this fix windows ci?
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
b3748
2024-09-12 22:30:11 +02:00
Dou Xinpeng
e6b7801bd1
cann: Add host buffer type for Ascend NPU ( #9406 )
...
* feat: Add host buffer type for Ascend NPU(CANN backend)
* fix some checking errors
* Add a few comments
b3747
2024-09-12 19:46:43 +08:00
fengerhu1
e665744317
llava : fix the script error in MobileVLM README ( #9054 )
...
Signed-off-by: Erhu Feng <2748250768@qq.com>
b3746
2024-09-12 14:34:22 +03:00
Xuan Son Nguyen
d4c3c10fad
lora : raise error if lm_head is ignored ( #9103 )
...
* lora : raise error if lm_head is ignored
* fix style
* clarify comment
2024-09-12 14:33:57 +03:00
Michael Podvitskiy
2a825116b6
cmake : fix for builds without GGML_CDEF_PUBLIC
( #9338 )
...
* `GGML_TARGET_DEFINES-NOTFOUND` fix for builds without `GGML_CDEF_PUBLIC`
* Update CMakeLists.txt, spaces fix
b3744
2024-09-12 14:30:01 +03:00
Huang Qi
4dc4f5f14a
ci : update HIP SDK to 24.Q3 (ROCm 6.1) ( #9329 )
b3743
2024-09-12 14:28:43 +03:00
daminho
c837981bba
py : add Phi-1.5/Phi-2 tokenizer ( #9361 )
...
* add phi2 tokenizer
* add phi name to convert_hf_to_gguf_update.py
* make tokenizer_pre consistent; llama.cpp work
2024-09-12 14:28:20 +03:00
Trivikram Kamat
3c26a1644d
ci : bump actions/checkout to v4 ( #9377 )
2024-09-12 14:27:45 +03:00
Michael Podvitskiy
ff76e18516
cmake : fixed the order of linking libraries for llama-quantize ( #9450 )
b3740
2024-09-12 14:27:14 +03:00
Molly Sophia
39f852f440
py : add special tokens in hf_converter for RWKV v6 ( #9428 )
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-09-12 14:25:16 +03:00
Ahmad Tameem
2b00fa7997
riscv : modify Makefile and add a RISCV_VECT to print log info ( #9442 )
...
- Added ggml_cpu_has_riscv_v() in GGML to print system info in log
- Modified Makefile to only use flag when cross compiling for RISC-V
b3738
2024-09-12 14:24:31 +03:00
Georgi Gerganov
d6a04f872d
ggml : hide ggml_object, ggml_cgraph, ggml_hash_set ( #9408 )
...
* ggml : hide ggml_object, ggml_cgraph, ggml_hash_set
ggml-ci
* ggml : add ggml-impl.h to backends
* ggml : fix compiler warnings
ggml-ci
* ggml : add assert upon adding nodes
b3737
2024-09-12 14:23:49 +03:00
Neo Zhang Jianyu
c9c8575a1a
enhance run script to be easy to change the parameters ( #9448 )
...
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
b3736
2024-09-12 17:44:17 +08:00
Xinpeng Dou
df4b7945ae
cann: Fix error when running a non-exist op ( #9424 )
b3735
2024-09-12 09:02:35 +08:00
Faisal Zaghloul
449ccfb6f5
Add Jais to list of supported models ( #9439 )
...
Co-authored-by: fmz <quic_fzaghlou@quic.com>
2024-09-12 02:29:53 +02:00
slaren
1b28061400
llama : skip token bounds check when evaluating embeddings ( #9437 )
b3733
2024-09-11 17:52:13 +02:00
Pavel Zloi
8db003a19d
py : support converting local models ( #7547 )
...
* Support of converting local models added to convert-hf-to-gguf-update.py
* Description fixed
* shutil added to imports
2024-09-11 15:29:51 +03:00
Xuan Son Nguyen
0996c5597f
llava : correct args for minicpmv-cli ( #9429 )
b3731
2024-09-11 12:59:13 +02:00
Xuan Son Nguyen
5bb2c5dbd2
files : remove accidentally added lora_test
submodule ( #9430 )
2024-09-11 13:02:09 +03:00
Farbod Bijary
67155ab7f5
feat: Implements retrying logic for downloading models using --model-url flag ( #9255 )
...
* feat: Implements retrying logic for downloading models using --model-url flag
* Update common/common.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* Update common/common.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* apply comments
* implements a retry function to avoid duplication
* fix editorconfig
* change function name
---------
Co-authored-by: farbod <farbod.bjary82@gmail.com>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
b3729
2024-09-11 11:22:37 +02:00
Johannes Gäßler
5af118efda
CUDA: fix --split-mode row race condition ( #9413 )
b3728
2024-09-11 10:22:40 +02:00
Georgi Gerganov
d2b496bff4
batched-bench : remove unused code ( #9305 )
b3727
2024-09-11 10:03:54 +03:00
R0CKSTAR
b34e023480
musa: remove Clang builtins mapping ( #9421 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
b3726
2024-09-11 03:46:55 +02:00
Alberto Cabrera Pérez
51b6038636
sycl : update support conditions ( #9394 )
...
* sycl : update support condition to im2col
Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
* Added TODO to remind supporting FP32 im2col
---------
Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
b3725
2024-09-11 08:53:42 +08:00
Georgi Gerganov
cb9c933eb2
flake.lock: Update ( #9360 )
...
Flake lock file updates:
• Updated input 'flake-parts':
'github:hercules-ci/flake-parts/af510d4a62d071ea13925ce41c95e3dec816c01d?narHash=sha256-ODYRm8zHfLTH3soTFWE452ydPYz2iTvr9T8ftDMUQ3E%3D' (2024-08-30)
→ 'github:hercules-ci/flake-parts/567b938d64d4b4112ee253b9274472dc3a346eb6?narHash=sha256-%2Bebgonl3NbiKD2UD0x4BszCZQ6sTfL4xioaM49o5B3Y%3D' (2024-09-01)
• Updated input 'flake-parts/nixpkgs-lib':
'a5d394176e
.tar.gz?narHash=sha256-uFf2QeW7eAHlYXuDktm9c25OxOyCoUOQmh5SZ9amE5Q%3D' (2024-08-01)
→ '356624c120
.tar.gz?narHash=sha256-Ss8QWLXdr2JCBPcYChJhz4xJm%2Bh/xjl4G0c0XlP6a74%3D' (2024-09-01)
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/71e91c409d1e654808b2621f28a327acfdad8dc2?narHash=sha256-GnR7/ibgIH1vhoy8cYdmXE6iyZqKqFxQSVkFgosBh6w%3D' (2024-08-28)
→ 'github:NixOS/nixpkgs/574d1eac1c200690e27b8eb4e24887f8df7ac27c?narHash=sha256-v3rIhsJBOMLR8e/RNWxr828tB%2BWywYIoajrZKFM%2B0Gg%3D' (2024-09-06)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-09-10 15:46:59 -07:00
Xuan Son Nguyen
6cd4e03444
arg : bring back missing ifdef ( #9411 )
...
* arg : bring back missing ifdef
* replace with llama_supports_gpu_offload
b3723
2024-09-10 22:41:29 +02:00
matteo
8d300bd35f
enable --special arg for llama-server ( #9419 )
...
Co-authored-by: matteo serva <matteo.serva@gmail.com>
b3722
2024-09-10 22:40:59 +02:00
slaren
49006c67b4
llama : move random seed generation to the samplers ( #9398 )
...
* llama_sampler_penalties : clamp penalty_last_n to zero
b3721
2024-09-10 18:04:25 +02:00
Georgi Gerganov
00ba2ff781
metal : fix compile warning with GGML_METAL_NDEBUG ( #0 )
b3720
2024-09-10 10:17:43 +03:00
Daniel Bevenius
83008b7cfe
llama : update llm_build_copy_mask_state comment [no ci] ( #9385 )
...
This commit updates the comment, which seems to contain a typo or be an
outdated comment, in the copy_mask_state function changing the variable
n_rs to n_kv.
I believe this change is correct and what the comment wants to
convey is to copy the states that are not going to be used in the
upcoming processing, which are the tokens states from n_seqs up to
the number of possible token states n_kv.
2024-09-10 10:03:21 +03:00
Molly Sophia
0b4ac75772
RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list ( #9387 )
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
b3718
2024-09-10 10:02:30 +03:00
slaren
fb3f249815
make : do not run llama-gen-docs when building ( #9399 )
b3717
2024-09-10 09:23:33 +03:00
Xuan Son Nguyen
bfe76d4a17
common : move arg parser code to arg.cpp
( #9388 )
...
* common : move arg parser to arg.cpp
* better categorize args
* add cmake
* missing climits
* missing cstdarg
* common : more explicit includes
* fix build
* refactor gpt_params_parse
* update server readme
* fix test
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b3716
2024-09-09 23:36:09 +02:00
Radoslav Gerganov
293bebe077
rpc : fix segfault with nkvo ( #9389 )
...
* rpc : fix nkvo
* rpc : buf_size must not be static
ref: #9337
---------
Co-authored-by: slaren <slarengh@gmail.com>
b3715
2024-09-09 18:40:10 +03:00
Prashant Vithule
5fac4d5764
ggml : vector length agnostic SVE support ( #9290 )
...
* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths
* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths
* Removed WhiteSpaces
* ggml : style changes + fix 512-bit nb loop check
- fix local scope in switch cases
- consistent predicate names
- empty lines when necessary
- opening braces, spaces
- const-correctness
- add asserts
* Update ggml/src/ggml-quants.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b3714
2024-09-09 18:37:18 +03:00
slaren
5fb5e24811
llama : minor sampling refactor (2) ( #9386 )
b3713
2024-09-09 17:10:46 +02:00
Georgi Gerganov
38ca6f644b
readme : update hot topics
2024-09-09 15:51:37 +03:00