Commit Graph

203 Commits

Author SHA1 Message Date
Eve
3407364776
Q6_K AVX improvements (#10118)
* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86
2024-11-04 23:06:31 +01:00
R0CKSTAR
cf8e0a3bb9
musa: add docker image support (#9685)
* mtgpu: add docker image support

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* mtgpu: enable docker workflow

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-10-10 20:10:37 +02:00
Xuan Son Nguyen
f3fdcfaa79
ci : fine-grant permission (#9710) 2024-10-04 11:47:19 +02:00
Diego Devesa
c83ad6d01e
ggml-backend : add device and backend reg interfaces (#9707)
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-10-03 01:49:47 +02:00
serhii-nakon
6f1d9d71f4
Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS (#9641)
* Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS

* Set ROCM_DOCKER_ARCH as string due it incorrectly build and cause OOM exit code
2024-09-30 20:57:12 +02:00
compilade
511636df0c
ci : reduce severity of unused Pyright ignore comments (#9697) 2024-09-30 14:13:16 -04:00
Neo Zhang Jianyu
95bc82fbc0
[SYCL] add missed dll file in package (#9577)
* update oneapi to 2024.2

* use 2024.1

---------

Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-09-26 17:38:31 +08:00
Xuan Son Nguyen
ea9c32be71
ci : fix docker build number and tag name (#9638)
* ci : fix docker build number and tag name

* fine-grant permissions
2024-09-25 17:26:01 +02:00
Huang Qi
e948a7da7a
CI: Provide prebuilt windows binary for hip (#9467) 2024-09-21 02:39:41 +02:00
Georgi Gerganov
6262d13e0b
common : reimplement logging (#9418)
https://github.com/ggerganov/llama.cpp/pull/9418
2024-09-15 20:46:12 +03:00
Mathijs Henquet
78203641fe
server : Add option to return token pieces in /tokenize endpoint (#9108)
* server : added with_pieces functionality to /tokenize endpoint

* server : Add tokenize with pieces tests to server.feature

* Handle case if tokenizer splits along utf8 continuation bytes

* Add example of token splitting

* Remove trailing ws

* Fix trailing ws

* Maybe fix ci

* maybe this fix windows ci?

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-09-12 22:30:11 +02:00
Huang Qi
4dc4f5f14a
ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329) 2024-09-12 14:28:43 +03:00
Trivikram Kamat
3c26a1644d
ci : bump actions/checkout to v4 (#9377) 2024-09-12 14:27:45 +03:00
slaren
6c89eb0b47
ci : disable rocm image creation (#9340) 2024-09-07 10:48:54 +03:00
awatuna
32b2ec88bc
Update build.yml (#9184)
build rpc-server for windows cuda
2024-09-06 00:34:36 +02:00
slaren
9fe94ccac9
docker : build images only once (#9225) 2024-08-28 17:28:00 +02:00
Georgi Gerganov
d5492f0525
ci : disable bench workflow (#9010) 2024-08-15 10:11:11 +03:00
Diogo Teles Sant'Anna
fc4ca27b25
ci : fix github workflow vulnerable to script injection (#9008)
Signed-off-by: Diogo Teles Sant'Anna <diogoteles@google.com>
2024-08-12 19:28:23 +03:00
Radoslav Gerganov
1f67436c5e
ci : enable RPC in all of the released builds (#9006)
ref: #8912
2024-08-12 19:17:03 +03:00
Georgi Gerganov
d3ae0ee8d7
py : fix requirements check '==' -> '~=' (#8982)
* py : fix requirements check '==' -> '~='

* cont : fix the fix

* ci : run on all requirements.txt
2024-08-12 11:02:01 +03:00
Johannes Gäßler
6eeaeba126
cmake: use 1 more thread for non-ggml in CI (#8740) 2024-07-28 22:32:44 +02:00
Johannes Gäßler
69c487f4ed
CUDA: MMQ code deduplication + iquant support (#8495)
* CUDA: MMQ code deduplication + iquant support

* 1 less parallel job for CI build
2024-07-20 22:25:26 +02:00
bandoti
17eb6aa8a9
vulkan : cmake integration (#8119)
* Add Vulkan to CMake pkg

* Add Sycl to CMake pkg

* Add OpenMP to CMake pkg

* Split generated shader file into separate translation unit

* Add CMake target for Vulkan shaders

* Update README.md

* Add make target for Vulkan shaders

* Use pkg-config to locate vulkan library

* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow

* Clean up tabs

* Move sudo to apt-key invocation

* Forward GGML_EXTRA_LIBS to CMake config pkg

* Update vulkan obj file paths

* Add shaderc to nix pkg

* Add python3 to Vulkan nix build

* Link against ggml in cmake pkg

* Remove Python dependency from Vulkan build

* code review changes

* Remove trailing newline

* Add cflags from pkg-config to fix w64devkit build

* Update README.md

* Remove trailing whitespace

* Update README.md

* Remove trailing whitespace

* Fix doc heading

* Make glslc required Vulkan component

* remove clblast from nix pkg
2024-07-13 18:12:39 +02:00
compilade
3fd62a6b1c
py : type-check all Python scripts with Pyright (#8341)
* py : type-check all Python scripts with Pyright

* server-tests : use trailing slash in openai base_url

* server-tests : add more type annotations

* server-tests : strip "chat" from base_url in oai_chat_completions

* server-tests : model metadata is a dict

* ci : disable pip cache in type-check workflow

The cache is not shared between branches, and it's 250MB in size,
so it would become quite a big part of the 10GB cache limit of the repo.

* py : fix new type errors from master branch

* tests : fix test-tokenizer-random.py

Apparently, gcc applies optimisations even when pre-processing,
which confuses pycparser.

* ci : only show warnings and errors in python type-check

The "information" level otherwise has entries
from 'examples/pydantic_models_to_grammar.py',
which could be confusing for someone trying to figure out what failed,
considering that these messages can safely be ignored
even though they look like errors.
2024-07-07 15:04:39 -04:00
Olivier Chafik
8748d8ac6f
json: attempt to skip slow tests when running under emulator (#8189) 2024-06-28 18:02:05 +01:00
loonerin
558f44bf83
CI: fix release build (Ubuntu+Mac) (#8170)
* CI: fix release build (Ubuntu)

PR #8006 changes defaults to build shared libs. However, CI for releases
expects static builds.

* CI: fix release build (Mac)

---------

Co-authored-by: loonerin <loonerin@users.noreply.github.com>
2024-06-27 21:01:23 +02:00
slaren
ae5d0f4b89
ci : publish new docker images only when the files change (#8142) 2024-06-26 21:59:28 +02:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake (#8006)
* scripts : update sync [no ci]

* files : relocate [no ci]

* ci : disable kompute build [no ci]

* cmake : fixes [no ci]

* server : fix mingw build

ggml-ci

* cmake : minor [no ci]

* cmake : link math library [no ci]

* cmake : build normal ggml library (not object library) [no ci]

* cmake : fix kompute build

ggml-ci

* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE

ggml-ci

* move public backend headers to the public include directory (#8122)

* move public backend headers to the public include directory

* nix test

* spm : fix metal header

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* scripts : fix sync paths [no ci]

* scripts : sync ggml-blas.h [no ci]

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00
slaren
dd047b476c
disable docker CI on pull requests (#8110) 2024-06-25 19:20:06 +02:00
slaren
8cb508d0d5
disable publishing the full-rocm docker image (#8083) 2024-06-24 08:36:11 +03:00
slaren
b6b9a8e606
fix CI failures (#8066)
* test-backend-ops : increase cpy max nmse

* server ci : disable thread sanitizer
2024-06-23 13:14:45 +02:00
slaren
9c77ec1d74
ggml : synchronize threads using barriers (#7993) 2024-06-19 15:04:15 +02:00
Georgi Gerganov
a04a953cab
codecov : remove (#8004) 2024-06-19 13:04:36 +03:00
olexiyb
f8ec8877b7
ci : fix macos x86 build (#7940)
In order to use old `macos-latest` we should use `macos-12`

Potentially will fix: https://github.com/ggerganov/llama.cpp/issues/6975
2024-06-14 20:28:34 +03:00
Olivier Chafik
1c641e6aac
build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809)
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew

* server: update refs -> llama-server

gitignore llama-server

* server: simplify nix package

* main: update refs -> llama

fix examples/main ref

* main/server: fix targets

* update more names

* Update build.yml

* rm accidentally checked in bins

* update straggling refs

* Update .gitignore

* Update server-llm.sh

* main: target name -> llama-cli

* Prefix all example bins w/ llama-

* fix main refs

* rename {main->llama}-cmake-pkg binary

* prefix more cmake targets w/ llama-

* add/fix gbnf-validator subfolder to cmake

* sort cmake example subdirs

* rm bin files

* fix llama-lookup-* Makefile rules

* gitignore /llama-*

* rename Dockerfiles

* rename llama|main -> llama-cli; consistent RPM bin prefixes

* fix some missing -cli suffixes

* rename dockerfile w/ llama-cli

* rename(make): llama-baby-llama

* update dockerfile refs

* more llama-cli(.exe)

* fix test-eval-callback

* rename: llama-cli-cmake-pkg(.exe)

* address gbnf-validator unused fread warning (switched to C++ / ifstream)

* add two missing llama- prefixes

* Updating docs for eval-callback binary to use new `llama-` prefix.

* Updating a few lingering doc references for rename of main to llama-cli

* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.

* Updating documentation references for lookup-merge and export-lora

* Updating two small `main` references missed earlier in the finetune docs.

* Update apps.nix

* update grammar/README.md w/ new llama-* names

* update llama-rpc-server bin name + doc

* Revert "update llama-rpc-server bin name + doc"

This reverts commit e474ef1df4.

* add hot topic notice to README.md

* Update README.md

* Update README.md

* rename gguf-split & quantize bins refs in **/tests.sh

---------

Co-authored-by: HanClinto <hanclinto@gmail.com>
2024-06-13 00:41:52 +01:00
slaren
c2ce6c47e4
fix CUDA CI by using a windows-2019 image (#7861)
* try to fix CUDA ci with --allow-unsupported-compiler

* trigger when build.yml changes

* another test

* try exllama/bdashore3 method

* install vs build tools before cuda toolkit

* try win-2019
2024-06-11 08:59:20 +03:00
slaren
fd5ea0f897
ci : try win-2019 on server windows test (#7854) 2024-06-10 15:18:41 +03:00
Georgi Gerganov
554c247caf
ggml : remove OpenCL (#7735)
ggml-ci
2024-06-04 21:23:20 +03:00
Masaya, Kato
a5735e4426
ggml : use OpenMP as a thread pool (#7606)
* ggml: Added OpenMP for multi-threads processing

* ggml : Limit the number of threads used to avoid deadlock

* update shared state n_threads in parallel region

* clear numa affinity for main thread even with openmp

* enable openmp by default

* fix msvc build

* disable openmp on macos

* ci : disable openmp with thread sanitizer

* Update ggml.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-03 17:14:15 +02:00
Meng, Hengyu
3854c9d07f
[SYCL] fix intel docker (#7630)
* Update main-intel.Dockerfile

* workaround for https://github.com/intel/oneapi-containers/issues/70

* reset intel docker in CI

* add missed in server
2024-05-30 16:19:08 +10:00
Brian
27891f6db0
docker.yml: disable light-intel and server-intel test (#7515)
* docker.yml: disable light-intel test

* docker.yml: disable server-intel test
2024-05-24 23:47:56 +10:00
Georgi Gerganov
197ff91462
build : remove zig (#7471) 2024-05-22 20:05:38 +03:00
Georgi Gerganov
3bc10cb485
server : fix temperature + disable some tests (#7409)
* server : fix temperature

* server : disable tests relying on parallel determinism

* ci : change server Debug -> RelWithDebInfo
2024-05-20 22:10:03 +10:00
slaren
d359f30921
llama : remove MPI backend (#7395) 2024-05-20 01:17:03 +02:00
Brian
e23b974f4c
labeler.yml: Use settings from ggerganov/llama.cpp [no ci] (#7363)
https://github.com/actions/labeler#using-configuration-path-input-together-with-the-actionscheckout-action
Recommends the use of checkout action to use the correct repo context
when applying settings for PR labels

e.g.

    steps:
    - uses: actions/checkout@v4 # Uploads repository content to the runner
      with:
        repository: "owner/repositoryName" # The one of the available inputs, visit https://github.com/actions/checkout#readme to find more
    - uses: actions/labeler@v5
      with:
        configuration-path: 'path/to/the/uploaded/configuration/file'
2024-05-19 20:51:03 +10:00
Georgi Gerganov
059031b8c4
ci : re-enable sanitizer runs (#7358)
* Revert "ci : temporary disable sanitizer builds (#6128)"

This reverts commit 4f6d1337ca.

* ci : trigger
2024-05-18 18:55:54 +03:00
Brian
de73196344
github-actions-labeler: initial commit (#7330)
* github-actions-labeler: initial commit [no ci]

* github actions: remove priority auto labeling [no ci]
2024-05-18 16:04:23 +10:00
Gavin Zhao
82ca83db3c
ROCm: use native CMake HIP support (#5966)
Supercedes #4024 and #4813.

CMake's native HIP support has become the
recommended way to add HIP code into a project (see
[here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)).
This PR makes the following changes:

1. The environment variable `HIPCXX` or CMake option
`CMAKE_HIP_COMPILER` should be used to specify the HIP
compiler. Notably this shouldn't be `hipcc`, but ROCm's clang,
which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously
this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`.
Note that since native CMake HIP support is not yet available on
Windows, on Windows we fall back to the old behavior.

2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the
GPU architectures to build for. Previously this was controled by
`GPU_TARGETS`.

3. Updated the Nix recipe to account for these new changes.

4. The GPU targets to build against in the Nix recipe is now
consistent with the supported GPU targets in nixpkgs.

5. Added CI checks for HIP on both Linux and Windows. On Linux, we test
both the new and old behavior.

The most important part about this PR is the separation of the
HIP compiler and the C/C++ compiler. This allows users to choose
a different C/C++ compiler if desired, compared to the current
situation where when building for ROCm support, everything must be
compiled with ROCm's clang.

~~Makefile is unchanged. Please let me know if we want to be
consistent on variables' naming because Makefile still uses
`GPU_TARGETS` to control architectures to build for, but I feel
like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're
calling `make`.~~ Makefile used `GPU_TARGETS` but the README says
to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of
`GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`.

Thanks to the suggestion of @jin-eld, to maintain backwards
compatibility (and not break too many downstream users' builds), if
`CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using
the original behavior and emit a warning that recommends switching
to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but
`CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS`
to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new
HIP support.

Signed-off-by: Gavin Zhao <git@gzgz.dev>
2024-05-17 17:03:03 +02:00
Max Krasnyansky
172b78210a
ci: fix bin/Release path for windows-arm64 builds (#7317)
Switch to Ninja Multi-Config CMake generator to resurect bin/Release path
that broke artifact packaging in CI.
2024-05-16 15:36:43 +10:00
Max Krasnyansky
13ad16af12
Add support for properly optimized Windows ARM64 builds with LLVM and MSVC (#7191)
* logging: add proper checks for clang to avoid errors and warnings with VA_ARGS

* build: add CMake Presets and toolchian files for Windows ARM64

* matmul-int8: enable matmul-int8 with MSVC and fix Clang warnings

* ci: add support for optimized Windows ARM64 builds with MSVC and LLVM

* matmul-int8: fixed typos in q8_0_q8_0 matmuls

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* matmul-int8: remove unnecessary casts in q8_0_q8_0

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-16 12:47:36 +10:00