3265 Commits

Author SHA1 Message Date
Georgi Gerganov
cce3dcffc5
cuda : non-cont concat support (#7610)
* tests : add non-cont concat tests

* cuda : non-cont concat support

ggml-ci
b3037
2024-05-29 15:38:26 +03:00
Radoslav Gerganov
210d99173d
llama-bench : add support for the RPC backend (#7435) b3036 2024-05-29 14:45:44 +03:00
slaren
87bdf2a199
ggml : use atomic_flag for critical section (#7598)
* ggml : use atomic_flag for critical section

* add windows shims
b3035
2024-05-29 13:36:39 +02:00
Georgi Gerganov
00281b7be3
scripts : remove mpi remnants 2024-05-29 14:31:18 +03:00
Georgi Gerganov
2ab977282b
sync : ggml b3033 2024-05-29 14:29:52 +03:00
Georgi Gerganov
72de268bec
ggml : restore ggml_rope_xpos_inplace (ggml/0)
ggml-ci
2024-05-29 14:29:33 +03:00
Akarshan Biswas
0e8d8bfd6c
Add Arc A750 and Arch linux to readme-sycl.md as verified GPU model and Linux distro (#7605) 2024-05-29 16:53:47 +10:00
zhouwg
504f0c340f
ggml : fix typo in ggml.c (#7603) b3030 2024-05-29 04:09:31 +02:00
Meng, Hengyu
b864b50ce5
[SYCL] Align GEMM dispatch (#7566)
* align GEMM dispatch
b3029
2024-05-29 07:00:24 +08:00
caitianchi
c38d152d7d fix warnings 2024-05-29 04:35:08 +08:00
caitianchi
07f48f9669 fix warnings 2024-05-29 04:09:44 +08:00
jaime-m-p
02c1ecad07
Tokenizer WPM fixes (#7500)
* Update random test: add_bos_token.
* Update random test: add WPM models for testing.
* Build vocab.special_tokens_cache using vocab token types.
* Fix and improve WPM preprocessing.
  - Fix unicode edge case combinations.
  - Split by whitspace in the same pass.
* Discard all tokens when no matching found.
b3028
2024-05-28 21:46:34 +02:00
Georgi Gerganov
6bd12ce409
sycl : fix assert (#7563) b3027 2024-05-28 22:22:50 +03:00
caitianchi
02eb445d73 sync master 2024-05-29 03:06:58 +08:00
tc-mb
28d4a7f9cc
Merge pull request #8 from OpenBMB/master
sync master
2024-05-29 03:03:26 +08:00
tc-mb
8bd47ce5d6
Merge pull request #7 from OpenBMB/prepare-PR
sync master
2024-05-29 02:50:30 +08:00
tc-mb
8767ce29cf
Merge branch 'prepare-PR-of-minicpm-v2.5' into prepare-PR 2024-05-29 02:49:59 +08:00
Giuseppe Scrivano
5442939fcc
llama : support small Granite models (#7481)
* Add optional MLP bias for Granite models

Add optional MLP bias for ARCH_LLAMA to support Granite models.
Partially addresses ggerganov/llama.cpp/issues/7116
Still needs some more changes to properly support Granite.

* llama: honor add_space_prefix from the model configuration

propagate the add_space_prefix configuration from the HF model
configuration to the gguf file and honor it with the gpt2 tokenizer.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

* llama: add support for small granite models

it works only for the small models 3b and 8b.

The convert-hf-to-gguf.py script uses the vocabulary size of the
granite models to detect granite and set the correct configuration.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

---------

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Co-authored-by: Steffen Roecker <sroecker@redhat.com>
b3026
2024-05-28 21:49:49 +03:00
caitianchi
b37ab0b1e5 add link 2024-05-29 02:21:41 +08:00
caitianchi
9495504e7b replace and organize code 2024-05-29 01:52:26 +08:00
caitianchi
3c306f18c8 clear code 2024-05-29 01:50:59 +08:00
k.h.lai
56411a950f
vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE (#7552) b3025 2024-05-28 19:25:08 +02:00
caitianchi
056d178160 rename wrapper 2024-05-29 00:18:17 +08:00
Radoslav Gerganov
2b737caae1
rpc : resource management rework (#7562)
* rpc : resource management rework

* address review comments
b3024
2024-05-28 18:13:36 +03:00
fairydreaming
ee3dff6b8e
Add support for DeepseekV2ForCausalLM (#7519)
* common : increase max number of experts to 160

* common : add tensors ATTN_Q_A, ATTN_Q_A_NORM, ATTN_Q_B, ATTN_KV_A_MQA, ATTN_KV_A_NORM, ATTN_KV_B needed by DeepSeek-V2 MLA (multi-head latent attention) architecture

* common : add model header parameters: leading_dense_block_count, expert_feed_forward_length, expert_shared_count, expert_weights_scale, attention.q_lora_rank, attention.kv_lora_rank, rope.scaling.yarn_log_multiplier

* convert-hf : add model conversion support for DeepseekV2ForCausalLM

* llama : add model types for DeepSeek-V2 and DeepSeek-V2-Lite models

* llama : add two new llm_build_moe_ffn() arguments: scale_w (whether to scale weights of selected MoE experts) and w_scale (numerical value of the scaling factor)

* llama : add inference support for LLM_ARCH_DEEPSEEK2

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
b3023
2024-05-28 17:07:05 +02:00
Georgi Gerganov
edc29433fa
tests : fix test-tokenizer-0.sh 2024-05-28 15:04:09 +03:00
Georgi Gerganov
8b99e2aa66
llama : handle unknown utf8 bytes (#7588) b3021 2024-05-28 13:55:35 +03:00
Brian
271ff3fc44
github: add refactor to issue template (#7561)
* github: add refactor issue template [no ci]

* Update 07-refactor.yml
2024-05-28 20:27:27 +10:00
Neo Zhang
e2b065071c
[SYCL]fix ggml_sycl_mul_mat_id() to match the change of api (#7436)
* fix mul_mat_id to match the change of api

* rm comment

* rm unused or duplicated code, rename as review comment
b3019
2024-05-28 10:53:37 +01:00
caitianchi
6366d62d6b updata cmakelist 2024-05-28 16:35:13 +08:00
Georgi Gerganov
0548a4187f
ggml : generalize GGML_OP_CONCAT (#7563)
* ggml : generalize GGML_OP_CONCAT (WIP)

ggml-ci

* tests : add dim != 2 tests

* metal : generalize concat kernel

* tests : naming

* cuda : generalize concat kernel

ggml-ci

* sycl : add warning and assert

* ggml : fix op params handling

* metal : bugfix kernel

ggml-ci

* ggml : reimplement CPU and Metal

* cuda : add asserts

ggml-ci

* ggml : fix ptrs

ggml-ci
b3018
2024-05-28 11:04:19 +03:00
caitianchi
e73a0c7c2f updata cmakelist 2024-05-28 15:26:09 +08:00
mgroeber9110
9335b969e8
server: do not remove whitespace at the start of a completion chunk (#7524) 2024-05-28 14:55:51 +10:00
Nathan Epstein
c41767154e
Markdownish code block fix (#7571)
* markdownish codeblock fix

* updating regexes
2024-05-28 14:41:14 +10:00
Ikko Eltociear Ashimine
74b239b3d5
llava : update clip.h (#7580)
overriden -> overridden
b3015
2024-05-28 12:48:16 +10:00
Djip007
852aafb163
update HIP_UMA #7399 (#7414)
* update HIP_UMA #7399

add use of hipMemAdviseSetCoarseGrain when LLAMA_HIP_UMA is enable.
- get x2 on prompte eval and x1.5 on token gen with rocm6.0 on ryzen 7940HX iGPU (780M/gfx1103)

* simplify code, more consistent style

---------

Co-authored-by: slaren <slarengh@gmail.com>
b3014
2024-05-28 01:40:47 +02:00
kunnis
0136966daf
adding in x64 targets to cmake presets (#7574) 2024-05-28 01:40:12 +02:00
Johannes Gäßler
10b1e45876
make: add --device-debug to NVCC debug flags (#7542) b3012 2024-05-27 19:34:40 +02:00
agray3
197c00681b
Allow multiple copy function pointers for CUDA graph kernel param updates (#7565)
CUDA graphs require parameter updates to kernels associated with
GGML_OP_CPY nodes. Previously the implementation only checked for a
single CUDA kernel in such nodes, but this caused a bug in cases where
2 such kernels exist. This fixes the issue by using a vector to allow
multiple function pointers to be stored and checked against.

Fixes #7942
b3011
2024-05-27 19:33:42 +02:00
caitianchi
d8974b8ea6 support ollama 2024-05-28 01:13:57 +08:00
AidanBeltonS
95f84d5ce8
Fix q_xxs using mul_mat_q (#7459) b3010 2024-05-27 22:04:51 +05:30
AidanBeltonS
5487593bc7
Add freq factors (#7495) 2024-05-27 18:04:09 +05:30
Georgi Gerganov
1d8fca72ae
metal : add GGML_OP_REPEAT kernels (#7557)
ggml-ci
b3008
2024-05-27 12:10:19 +03:00
Georgi Gerganov
62bfef5194
metal : disable FA kernel for HS=256 (#7556)
ggml-ci
b3007
2024-05-27 10:38:39 +03:00
Georgi Gerganov
eaf6e03174
llama : add comments about experimental flags (#7544) b3006 2024-05-27 09:24:13 +03:00
Brian
d6ef0e77dd
github: add self sorted issue ticket forms (#7543)
* github: add self sorted issue ticket forms [no ci]

* github: consolidate BSD in bug issue ticket

* github: remove contact from bug ticket template [no ci]

* github: remove bios from os dropdown in bug report [no ci]
2024-05-27 10:54:30 +10:00
caitianchi
8541e99629 better pos_embed in clip 2024-05-27 04:27:54 +08:00
caitianchi
2997a680d2 change for ollama 2024-05-27 03:42:56 +08:00
caitianchi
18fe620976 change for ollama 2024-05-27 03:29:55 +08:00
caitianchi
d9fbc1d1c5 add positions index 2024-05-27 03:18:35 +08:00