3265 Commits

Author SHA1 Message Date
Georgi Gerganov
6e299132e7
clip : style changes 2024-08-06 11:44:29 +03:00
caitianchi
65f7455cea Modify 2 notes 2024-07-26 21:49:23 +08:00
caitianchi
f3d400dac0 remove uhd_image_embed 2024-07-26 21:15:03 +08:00
caitianchi
72b962925b delete minicpmv-wrapper in pr 2024-07-25 16:01:26 +08:00
caitianchi
107e1edb20 fix uhd code for review comment 2024-07-25 15:22:11 +08:00
caitianchi
6fd0937e9f remove the extern "C", MINICPMV_API 2024-07-23 15:25:32 +08:00
caitianchi
fcde997126 remove load_image_size into clip_ctx 2024-07-23 15:24:43 +08:00
caitianchi
3642be9937 fix KEY_HAS_MINICPMV_PROJ 2024-07-23 14:55:55 +08:00
caitianchi
dad4abe1bc add warn 2024-07-23 11:57:42 +08:00
caitianchi
62fa15bcd2 fix cmakefile 2024-07-23 11:52:34 +08:00
caitianchi
4c755832fe remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir 2024-07-22 21:44:56 +08:00
caitianchi
be8b5b2f8d fix code review 2024-07-22 21:34:21 +08:00
caitianchi
292a46906d change pr readme 2024-07-20 14:45:19 +08:00
caitianchi
5959b14b06 fix llama-minicpmv-cli in cmake file 2024-07-19 11:29:17 +08:00
caitianchi
c5b68515f0 fix issues for merging 2024-07-17 15:04:25 +08:00
caitianchi
3e6348b8dc fix bug in clip 2024-07-07 13:12:46 +08:00
caitianchi
977941d9fe imitate reshape bug of python code 2024-07-04 17:25:02 +08:00
caitianchi
4c67d7cef5 add space in "-1" 2024-06-25 20:06:55 +08:00
caitianchi
e68c8bc1e3 change n_layer 2024-06-25 20:05:52 +08:00
caitianchi
8f0350578d fix quality problem in pr code 2024-06-25 18:51:06 +08:00
tc-mb
cb8cfb9d4d
Merge pull request #15 from OpenBMB/master
sync master
2024-06-24 11:29:30 +08:00
tc-mb
77beb4d153
Merge branch 'prepare-PR-of-minicpm-v2.5' into master 2024-06-24 11:29:17 +08:00
slaren
95f57bb5d5
ggml : remove ggml_task_type and GGML_PERF (#8017)
* ggml : remove ggml_task_type and GGML_PERF

* check abort_callback on main thread only

* vulkan : remove usage of ggml_compute_params

* remove LLAMA_PERF
b3209
2024-06-24 03:07:59 +02:00
Eddie-Wang
e112b610a1
llama : add support for BitnetForCausalLM (#7931)
* hf bitnet v1

* hf bitnet e2e v2

* finish bitnet e2e

* finish f16 hf bitnet e2e

* remove unsed

* finish bitnet i2 e2e

* move i2s to quantize v1

* move i2 to quantize

* clean code

* clean code 2

* fix codestyle

* fix code

* fix

* fix code

* fix merge

* remove unused

* change table name

* fix whitespace

* delete redundant

* i2_s to absmax

* finish i2_s/i8_s vec_dot x86 simd

* i2s->q22

* fix code

* remove block scale

* add dequantize

* fix seq

* update avx2

* remove q2_2

* remove q22_grid

* fix whitespace

* reuse llm_build_kv

* fix bo

---------

Co-authored-by: root <root@wangjinheng>
b3208
2024-06-23 21:27:57 +03:00
Aarni Koskela
6a2f298bd7
server : fix JSON-Scheme typo (#7975) 2024-06-23 11:03:08 -04:00
Daniel Bevenius
11318d9aa1
Fix typo in llama_set_embeddings comment (#8077) b3206 2024-06-23 15:39:45 +02:00
slaren
b6b9a8e606
fix CI failures (#8066)
* test-backend-ops : increase cpy max nmse

* server ci : disable thread sanitizer
b3205
2024-06-23 13:14:45 +02:00
0cc4m
45c0e2e4c1
Refactor Vulkan backend to allow multiple contexts (#7961)
* Refactor Vulkan backend to allow multiple contexts

* Fix too many shader groups called validation error in llama3 on AMD and Intel GPUs

* Fix Vulkan debug build error
b3204
2024-06-23 10:21:25 +02:00
Clint Herron
b5a5f34efa
Removing extra blank lines that were breaking Lint. (#8067) b3203 2024-06-22 14:28:18 -04:00
Xuan Son Nguyen
3e58b0ee35
cvector: fix CI + correct help message (#8064)
* cvector: fix CI + correct help message

* also correct --pca-iter
b3202
2024-06-22 18:11:30 +02:00
HatsuneMikuUwU33
adf480c3ab
cvector-generator: Moe Moe Fixie-Fixie for Lots of Formats~! ♡(ᐢ ᴥ ᐢ)♡ (#8052)
* Update negative.txt

* Update positive.txt

* Update cvector-generator.cpp

* Update cvector-generator.cpp
b3201
2024-06-22 17:19:37 +02:00
0xspringtime
3aa184a8c7
convert-hf : change assert to exception (#8015) 2024-06-22 15:37:41 +02:00
ddh0
5b48cd53a8
Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 values (#8058)
Uses the values computed by @JohannesGaessler in PR #7413
b3199
2024-06-22 15:16:10 +02:00
Clint Herron
c5a8d4b749
JSON Schema to GBNF integration tests (#7790)
* Adding simple bare-bones test for end-to-end integration test for json validation against auto-generated JSON-schema grammars.

* Adding additional examples as documented in #7789 . Also adding the ability to automatically output improperly failing grammars to debug output files so they can more easily be examined in the gbnf-validator program.

* Uncommenting formerly commented tests so that they fail for others who are attempting to reproduce the bugs.

* Merging improved schema test methods added by @ochafik in #7797

* Adding #define to temporarily remove failing tests so that this PR can pass CI, but still be useful for other PRs that want to leverage the framework.

* Fixing nits from ochafik. Removing escape slashes, adding additional failing cases, fixing some other strings.

* Fixing grammar indentation to be consistent throughout file.
2024-06-21 23:18:36 -04:00
k.h.lai
557b653dc9
vulkan: detect multiple devices by deviceUUID instead of deviceID (#8022)
* vulkan: detect multiple devices by deviceUUID instead of deviceID

* vulkan: remove unneeded variables

* vulkan: fix id query
b3197
2024-06-21 10:28:20 +02:00
Eve
7d5e8777ae
ggml : AVX IQ quants (#7845)
* initial iq4_xs

* fix ci

* iq4_nl

* iq1_m

* iq1_s

* iq2_xxs

* iq3_xxs

* iq2_s

* iq2_xs

* iq3_s before sllv

* iq3_s

* iq3_s small fix

* iq3_s sllv can be safely replaced with sse multiply
b3196
2024-06-21 08:57:36 +03:00
Georgi Gerganov
a927b0f3dd
llama : optimize long word tokenization with WPM (#8034)
ggml-ci
b3195
2024-06-21 08:51:28 +03:00
Douglas Hanley
80ea089d77
llama : allow pooled embeddings on any model (#7477)
* create append_pooling operation; allow to specify attention_type; add last token pooling; update examples

* find result_norm/result_embd tensors properly; update output allocation logic

* only use embd output for pooling_type NONE

* get rid of old causal_attn accessor

* take out attention_type; add in llama_set_embeddings

* bypass logits when doing non-NONE pooling
b3194
2024-06-21 08:38:22 +03:00
Shuichi Tsutsumi
0e64591e82
swiftui : enable stream updating (#7754) b3193 2024-06-21 08:30:58 +03:00
Hamdoud Hakem
b1ef562bc1
requirements : Bump torch and numpy for python3.12 (#8041) 2024-06-20 22:01:15 +02:00
Hamdoud Hakem
17b291a6a5
convert-hf : Fix the encoding in the convert-hf-to-gguf-update.py (#8040) 2024-06-20 21:59:59 +02:00
Johannes Gäßler
abd894ad96
common: fix warning (#8036)
* common: fix warning

* Update common/common.cpp

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
b3190
2024-06-20 16:40:13 +02:00
luoyu-intel
de391e4c80
[SYCL] Fix windows build and inference (#8003)
* add sycl preset

* fix debug link error. fix windows crash

* update README
b3189
2024-06-20 21:19:05 +08:00
Johannes Gäßler
d50f8897a7
CUDA: stream-k decomposition for MMQ (#8018)
* CUDA: stream-k decomposition for MMQ

* fix undefined memory reads for small matrices
b3188
2024-06-20 14:39:21 +02:00
Michael de Gans
2075a66a96
metal : fix ggml_metal_supports_op for BF16 (#8021)
Currently the Metal backend does not support BF16. `ggml_metal_supports_op` was returning true in these cases, leading to a crash with models converted with `--leave-output-tensor`. This commit checks if the first few sources types are BF16 and returns false if that's the case.
b3187
2024-06-20 08:32:01 +03:00
sasha0552
ba58993152
server : fix smart slot selection (#8020) b3186 2024-06-20 09:57:10 +10:00
Michael de Gans
a7854743c5
un-ignore build-info.cmake and build-info.sh (#7996)
* un-ignore `build-info.cmake` and `build-info.sh`

I am assuming that ignoring them was unintentional. If they are ignored, some tools, like cargo, will consider the files inexistent, even if they're comitted, for the purpose of publishing. This leads to the build failing in such cases.

* un-ignore `build-info.cpp.in`

For the same reason as the previous two files.

* Reorganize `.gitignore`

* Add exceptions for files mentioned by @slaren

I did leave .clang-tidy since it was explicitly ignored before.

* Add comments for organization
* Sort some lines for pretty
* Test with `make` and `cmake` builds to ensure no build artifacts might be comitted

* Remove `.clang-tidy` from `.gitignore`

Per comment by @ggerganov

* Remove `IDEWorkspaceChecks.plist` from root-level `.gitignore`
2024-06-19 22:10:42 +02:00
slaren
9c77ec1d74
ggml : synchronize threads using barriers (#7993) b3184 2024-06-19 15:04:15 +02:00
Georgi Gerganov
a04a953cab
codecov : remove (#8004) b3183 2024-06-19 13:04:36 +03:00
Meng, Hengyu
623494a478
[SYCL] refactor (#6408)
* seperate lower precision GEMM from the main files

* fix workgroup size hardcode
b3182
2024-06-19 09:11:51 +08:00