Robert Collins
3a8e9af402
imatrix : support combine-only ( #10492 )
...
* imatrix-combine-only idea
* ensured that behavior consistent with log
2024-11-29 19:21:37 +02:00
Diego Devesa
a3a3048e7a
cleanup UI link list ( #10577 )
...
* cleanup UI link list
* sort list alphabetically
* add missing licenses
2024-11-29 17:45:08 +01:00
Georgi Gerganov
f0678c5ff4
ggml : fix I8MM Q4_1 scaling factor conversion ( #10562 )
...
ggml-ci
2024-11-29 16:25:39 +02:00
Shupei Fan
4b3242bbea
ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 ( #10580 )
2024-11-29 14:49:02 +01:00
Alberto Cabrera Pérez
0f77aae560
sycl : offload of get_rows set to 0 ( #10432 )
2024-11-29 20:38:45 +08:00
Alberto Cabrera Pérez
266b8519ee
sycl : Reroute permuted mul_mats through oneMKL ( #10408 )
...
This PR fixes the failing MUL_MAT tests for the sycl backend.
2024-11-29 09:49:43 +00:00
Chenguang Li
938f608742
CANN: RoPE operator optimization ( #10563 )
...
* [cann] RoPE operator optimization
* [CANN]Code Formatting
---------
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-11-29 14:46:55 +08:00
Jeff Bolz
f095a649ec
vulkan: get the first command buffer submitted sooner ( #10499 )
...
This is an incremental improvement over #9118 to get work to the GPU a bit
sooner. The first part is to start with a smaller number of nodes before
the first submit, and ramp it up to the current 100 nodes/submit. The
second part is to reduce the dryrun overhead for all the nodes that just
need to request descriptor space.
With these changes I get around 1-2% speedup on RTX 4070 combined with my
old Haswell-era CPU.
2024-11-29 07:18:02 +01:00
Ting Lou
678d7994f4
llava: return false instead of exit ( #10546 )
2024-11-29 01:09:46 +01:00
Georgi Gerganov
dc22344088
ggml : remove redundant copyright notice + update authors
2024-11-28 20:46:40 +02:00
Georgi Gerganov
4c0a95b107
llama : add missing model types
2024-11-28 20:45:07 +02:00
Xuan Son Nguyen
6c59567689
server : (tests) don't use thread for capturing stdout/stderr, bump openai client library ( #10568 )
...
* server : (tests) don't use thread for capturing stdout/stderr
* test: bump openai to 1.55.2
* bump openai to 1.55.3
2024-11-28 19:17:49 +01:00
Johannes Gäßler
890719311b
common: fix warning message when no GPU found ( #10564 )
2024-11-28 18:15:25 +01:00
Random Fly
7281cf13ad
docs: fix outdated usage of llama-simple ( #10565 )
2024-11-28 16:03:11 +01:00
Diego Devesa
e90688edd0
ci : fix tag name in cuda and hip releases ( #10566 )
2024-11-28 15:58:54 +01:00
Georgi Gerganov
76b27d29c2
ggml : fix row condition for i8mm kernels ( #10561 )
...
ggml-ci
2024-11-28 14:56:37 +02:00
Georgi Gerganov
eea986f215
cmake : fix ARM feature detection ( #10543 )
...
ggml-ci
2024-11-28 14:56:23 +02:00
Shupei Fan
c202cef168
ggml-cpu: support IQ4_NL_4_4 by runtime repack ( #10541 )
...
* ggml-cpu: support IQ4_NL_4_4 by runtime repack
* ggml-cpu: add __ARM_FEATURE_DOTPROD guard
2024-11-28 13:52:03 +01:00
Sergio López
2025fa67e9
kompute : improve backend to pass test_backend_ops ( #10542 )
...
* kompute: op_unary: reject unsupported parameters
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: softmax: implement ALiBi support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: rope: implement neox and phi3 support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: op_mul_mat_q4_k permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: op_mul_mat_[q4_0|q4_1|q8_0] permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: op_mul_mat_f16 permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: op_mul_mat_q6_k permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com>
---------
Signed-off-by: Sergio Lopez <slp@redhat.com>
2024-11-28 12:51:38 +01:00
Ruixin Huang
c6bc73951e
CANN: Update cann.md to display correctly in CLion ( #10538 )
2024-11-28 15:27:11 +08:00
leo-pony
605fa66c50
CANN: Fix SOC_TYPE compile bug ( #10519 )
...
* CANN: Fix the bug build fail on Ascend310P under two cases:
1) Manual specify SOC_TYPE
2) Under some unusual compile environment
* Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU.
* fix CANN compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version
2024-11-28 15:25:24 +08:00
Chenguang Li
b7420131bf
CANN: ROPE operator optimization ( #10540 )
...
* [cann] ROPE operator optimization
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-11-28 14:24:46 +08:00
Xuan Son Nguyen
9f912511bc
common : fix duplicated file name with hf_repo and hf_file ( #10550 )
2024-11-27 22:30:52 +01:00
uvos
3ad5451f3b
Add some minimal optimizations for CDNA ( #10498 )
...
* Add some minimal optimizations for CDNA
* ggml_cuda: set launch bounds also for GCN as it helps there too
2024-11-27 17:10:08 +01:00
Diego Devesa
46c69e0e75
ci : faster CUDA toolkit installation method and use ccache ( #10537 )
...
* ci : faster CUDA toolkit installation method and use ccache
* remove fetch-depth
* only pack CUDA runtime on master
2024-11-27 11:03:25 +01:00
Georgi Gerganov
9e2301f4a4
metal : fix group_norm support condition ( #0 )
2024-11-27 11:22:14 +02:00
Georgi Gerganov
fee824a1a1
sync : ggml
2024-11-27 11:10:42 +02:00
Frankie Robertson
9150f8fef9
Do not include arm_neon.h when compiling CUDA code (ggml/1028)
2024-11-27 11:10:27 +02:00
Jeff Bolz
c31ed2abfc
vulkan: define all quant data structures in types.comp ( #10440 )
2024-11-27 08:32:54 +01:00
Jeff Bolz
5b3466bedf
vulkan: Handle GPUs with less shared memory ( #10468 )
...
There have been reports of failure to compile on systems with <= 32KB
of shared memory (e.g. #10037 ). This change makes the large tile size
fall back to a smaller size if necessary, and makes mul_mat_id fall
back to CPU if there's only 16KB of shared memory.
2024-11-27 08:30:27 +01:00
Jeff Bolz
249a7902ec
vulkan: further optimize q5_k mul_mat_vec ( #10479 )
2024-11-27 08:21:59 +01:00
Jeff Bolz
71a64989a5
vulkan: skip integer div/mod in get_offsets for batch_idx==0 ( #10506 )
2024-11-27 08:08:54 +01:00
Jeff Bolz
4a57d362e1
vulkan: optimize Q2_K and Q3_K mul_mat_vec ( #10459 )
2024-11-27 08:00:50 +01:00
Diego Devesa
c9b00a70b0
ci : fix cuda releases ( #10532 )
2024-11-26 22:12:10 +01:00
Shane A
de5097351c
Add OLMo 2 model in docs ( #10530 )
...
* Add link to OLMo 2 model in docs
* Change link to landing page
2024-11-26 21:55:29 +01:00
Diego Devesa
5a349f2809
ci : remove nix workflows ( #10526 )
2024-11-26 21:13:54 +01:00
Diego Devesa
30ec398321
llama : disable warnings for 3rd party sha1 dependency ( #10527 )
2024-11-26 21:01:47 +01:00
Tristan Druyen
be0e350c8b
Fix HIP flag inconsistency & build docs ( #10524 )
...
* Fix inconsistency of HIP flags in cmake & make
* Fix docs regarding GGML_HIP
2024-11-26 19:27:28 +01:00
R0CKSTAR
249cd93da3
mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make ( #10516 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-11-26 17:00:41 +01:00
Jeff Bolz
904109ed0d
vulkan: fix group_norm ( #10496 )
...
Fix bad calculation of the end of the range. Add a backend test that
covers the bad case (taken from stable diffusion).
Fixes https://github.com/leejet/stable-diffusion.cpp/issues/439 .
2024-11-26 16:45:05 +01:00
Xuan Son Nguyen
45abe0f74e
server : replace behave with pytest ( #10416 )
...
* server : replace behave with pytest
* fix test on windows
* misc
* add more tests
* more tests
* styling
* log less, fix embd test
* added all sequential tests
* fix coding style
* fix save slot test
* add parallel completion test
* fix parallel test
* remove feature files
* update test docs
* no cache_prompt for some tests
* add test_cache_vs_nocache_prompt
2024-11-26 16:20:18 +01:00
Neo Zhang Jianyu
0bbd2262a3
restore the condistion to build & update pacakge when merge ( #10507 )
...
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-11-26 21:43:47 +08:00
Georgi Gerganov
ab96610b1e
cmake : enable warnings in llama ( #10474 )
...
* cmake : enable warnings in llama
ggml-ci
* cmake : add llama_get_flags and respect LLAMA_FATAL_WARNINGS
* cmake : get_flags -> ggml_get_flags
* speculative-simple : fix warnings
* cmake : reuse ggml_get_flags
ggml-ci
* speculative-simple : fix compile warning
ggml-ci
2024-11-26 14:18:08 +02:00
Diego Devesa
7db3846a94
ci : publish the docker images created during scheduled runs ( #10515 )
2024-11-26 13:05:20 +01:00
Diego Devesa
c6807b3f28
ci : add ubuntu cuda build, build with one arch on windows ( #10456 )
2024-11-26 13:05:07 +01:00
Charles Xu
25669aa92c
ggml-cpu: cmake add arm64 cpu feature check for macos ( #10487 )
...
* ggml-cpu: cmake add arm64 cpu feature check for macos
* use vmmlaq_s32 for compile option i8mm check
2024-11-26 13:37:05 +02:00
Georgi Gerganov
84e1c33cde
server : fix parallel speculative decoding ( #10513 )
...
ggml-ci
2024-11-26 13:36:40 +02:00
Georgi Gerganov
811872a59d
speculative : simplify the implementation ( #10504 )
...
ggml-ci
2024-11-26 12:29:38 +02:00
Shanshan Shen
9a4b79bcfa
CANN: Improve the Inferencing Performance for Ascend NPU Device ( #10454 )
...
* improve inferencing performance for ascend npu.
Co-authored-by: Frank Mai <thxCode@thxcode0824@gmail.com>
* some modification after review
* some modifications after review
* restore some modifications
* restore some modifications
---------
Co-authored-by: shanshan shen <shanshanshen333@gmail.com>
Co-authored-by: Frank Mai <thxCode@thxcode0824@gmail.com>
2024-11-26 18:08:37 +08:00
Chenguang Li
7066b4cce2
CANN: RoPE and CANCAT operator optimization ( #10488 )
...
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-11-26 17:31:05 +08:00