Commit Graph

  • 576c8e6186
    gitignore: add several entries specific to Visual Studio Borislav Stanimirov 2023-06-16 09:07:51 +0300
  • 4eff089421 examples : add JSON schema grammars Evan Jones 2023-06-10 00:08:05 -0400
  • 8cc5bed078 fix: add auto detection on the BLAS_INCLUDE_DIRS zenix 2023-06-16 13:19:56 +0900
  • 488c62acf9 Merge remote-tracking branch 'upstream/master' Randall Fitzgerald 2023-06-15 17:12:29 -0400
  • a09f9195be
    Fixed CUDA runtime version check (#1879) master-a09f919 Johannes Gäßler 2023-06-15 21:49:08 +0200
  • bed9275617
    cmake : remove whitespaces master-bed9275 Georgi Gerganov 2023-06-15 21:56:50 +0300
  • c36e81da62
    examples : add chat-vicuna.sh (#1854) master-c36e81d yangli2 2023-06-15 11:05:53 -0700
  • dd13ec96ed Fixed CUDA runtime version check JohannesGaessler 2023-06-15 19:55:36 +0200
  • 3559433fec
    cmake : set include path for OpenBlas (#1830) master-3559433 Igor Okulist 2023-06-15 12:51:26 -0500
  • 69b34a0e80
    swift : Package compile breaks due to ggml-metal.metal (#1831) master-69b34a0 Frederik Vogel 2023-06-16 02:47:04 +0900
  • cf267d1c71
    make : add train-text-from-scratch (#1850) master-cf267d1 daboe01 2023-06-15 19:42:48 +0200
  • 4af9a7d6d9
    Merge branch 'master' into finetuning-acessability Georgi Gerganov 2023-06-15 20:42:26 +0300
  • 9a60bbe8de
    Update examples/train-text-from-scratch/README.md Georgi Gerganov 2023-06-15 20:41:58 +0300
  • 9dda13e5e1
    readme : server compile flag (#1874) Srinivas Billa 2023-06-15 18:36:38 +0100
  • 37e257c48e
    make : clean *.so files (#1857) master-37e257c sandyiscool 2023-06-15 23:06:06 +0530
  • 64cc19b4fe
    Fix the validation of main device (#1872) master-64cc19b Howard Su 2023-06-16 01:29:59 +0800
  • 4bfcc855ab
    metal : parallel command buffer encoding (#1860) master-4bfcc85 Georgi Gerganov 2023-06-15 20:29:48 +0300
  • ef43a62289
    metal : determine number of command buffers based on gf->n_threads Georgi Gerganov 2023-06-15 20:26:56 +0300
  • 6b8312e797
    Better error when using both LoRA + GPU layers (#1861) master-6b8312e Johannes Gäßler 2023-06-15 19:06:46 +0200
  • 0971f83bca added eos token id handling for starcoder models, as they use a different EOS ID Concedo 2023-06-15 22:57:14 +0800
  • 2d5e8b2ca0 Add newline at end of file Howard Su 2023-06-15 21:39:21 +0800
  • 013992e280
    Update README.md Srinivas Billa 2023-06-15 14:38:20 +0100
  • 77ab0c0f3d Fix embedding when embedding layer on GPU Howard Su 2023-06-15 21:33:58 +0800
  • 4e2e286cca Fix the validation of main device Howard Su 2023-06-15 21:31:12 +0800
  • 8a2a73102c Fixes CMake style to use lowercase like everywhere else Jeremy Dunn 2023-06-15 08:28:58 -0500
  • 6b764ee72f
    Merge pull request #2 from ggerganov/master l3utterfly 2023-06-15 21:22:51 +0800
  • eff0834249
    Merge branch 'ggerganov:master' into master yangli2 2023-06-15 06:19:39 -0700
  • 3649d35cca Merge branch 'master' into concedo_experimental Concedo 2023-06-15 18:24:31 +0800
  • aee859519e
    Update README.md Randall Fitzgerald 2023-06-15 01:50:54 -0700
  • 6a113eeec8 Merge branch 'concedo' into concedo_experimental Concedo 2023-06-15 14:47:32 +0800
  • b1b8dc32c9
    Fix Makefile for CUBLAS. (#241) Ycros 2023-06-15 16:46:47 +1000
  • 23710144dc fixed clean target daboe01 2023-06-15 07:02:00 +0200
  • 8ff35ef944 updated lite Concedo 2023-06-15 12:13:55 +0800
  • 414f25104b remove swp file Evan Jones 2023-06-15 00:13:13 -0400
  • 58ca9bc6c0 adjust JSON grammar Evan Jones 2023-06-15 00:06:54 -0400
  • b876d19cff fix bugs with empty token and EOS Evan Jones 2023-06-14 23:53:55 -0400
  • 421c6e1ca1 support alternates in root rule Evan Jones 2023-06-14 23:53:12 -0400
  • 50537b471d exposed modules so that they can be invoked by nix run github:ggerganov/llama.cpp#server etc Faez Shakil 2023-06-15 01:49:03 +0500
  • f858cd64d4 Merge remote-tracking branch 'upstream/master' Randall Fitzgerald 2023-06-14 16:48:01 -0400
  • 5e107c2aac
    Merge pull request #24 from anon998/logit-bias Randall Fitzgerald 2023-06-14 16:27:43 -0400
  • 61df8e9217
    add cudaMemset Henri Vasserman 2023-06-14 22:46:10 +0300
  • a836529996
    Merge 'origin/master' into hipblas Henri Vasserman 2023-06-14 22:41:55 +0300
  • fea717efa8 Better error when using both LoRA + GPU layers JohannesGaessler 2023-06-14 21:21:28 +0200
  • dc67f1a06e cuda : faster k-quant dot kernels Iwan Kawrakow 2023-06-14 22:14:21 +0300
  • 1556bbb6a3 block_q5_k const hoist Steven Roussey 2023-06-14 12:02:40 -0700
  • 69bae5d277
    metal : parallel command buffer encoding Georgi Gerganov 2023-06-14 21:01:48 +0300
  • 254a7a7a5f
    CUDA full GPU acceleration, KV cache in VRAM (#1827) master-254a7a7 Johannes Gäßler 2023-06-14 19:47:19 +0200
  • bd81096927 fix typo in readme + don't ignore integers anon 2023-06-14 13:29:05 -0300
  • 2048c061d5 Update Makefile to clean *.so files too. Sandeep 2023-06-14 21:42:09 +0530
  • b783da97a6 Fixed LLAMA_CUDA_DMMV_Y > 1 for WizardLM JohannesGaessler 2023-06-14 16:36:46 +0200
  • 546f850796
    Update examples/server/server.cpp Henri Vasserman 2023-06-14 17:41:58 +0300
  • 6f54ad042b fixed: model path was wrong daboe01 2023-06-14 14:26:07 +0200
  • 3ed3e7b7e2 reverted sequence mode for rwkv due to multiple issues with speed loss with bigger quantized models Concedo 2023-06-14 20:03:14 +0800
  • addd592828 fixed: naming of binary daboe01 2023-06-14 12:42:12 +0200
  • 1e9980e20a fixed: name of executable was wrong daboe01 2023-06-14 12:30:07 +0200
  • 7fff8782e3 fixed: targed was in wrong line daboe01 2023-06-14 12:27:35 +0200
  • 09f0a94519 make finetuning example accessible daboe01 2023-06-14 12:25:49 +0200
  • 9af5ab5c1e Add an example script that works with the Vicuna model. Adding a small bit of documenting comment in llama.h. Yang Li 2023-06-14 01:09:21 -0700
  • 8f65eecf20
    typo and comments simple.cpp SuperUserNameMan 2023-06-14 09:33:31 +0200
  • 9d2f4a8000 Used local copy of CLBlast instead of installed one l3utterfly 2023-06-14 15:09:05 +0800
  • 7a4f712a29
    removed trailing white spaces simple.cpp SuperUserNameMan 2023-06-14 08:58:18 +0200
  • f83b66606b Merge branch 'concedo' into concedo_experimental Concedo 2023-06-14 11:50:24 +0800
  • 443903fa0f up ver with these minor improvements Concedo 2023-06-14 11:50:13 +0800
  • ce36167976
    fix Fix the link on the Mac platform OpenCL method (#227) tqcq 2023-06-14 11:41:39 +0800
  • f5247be0d7 Merge branch 'master' into concedo_experimental Concedo 2023-06-14 11:35:43 +0800
  • 2b4a286e56 Merge remote-tracking branch 'occam/kquant-opencl' into concedo_experimental Concedo 2023-06-14 11:34:53 +0800
  • e4265198ed added cublas back into the makefile as some people requested Concedo 2023-06-14 11:34:40 +0800
  • a47072b85d Cleaned up code, added comments JohannesGaessler 2023-06-14 00:00:53 +0200
  • 51830ee5e6 Fixed Windows performance JohannesGaessler 2023-06-13 22:37:17 +0200
  • 222c679842 Move a repeated calc to const Steven Roussey 2023-06-13 12:51:08 -0700
  • d9f38465b7
    ci: add linux binaries to release build ci_cublas_linux-d9f3846 Green Sky 2023-05-05 00:01:30 +0200
  • dba14529de Added a --low-vram option JohannesGaessler 2023-06-13 21:40:33 +0200
  • 9254920265
    baby-llama : fix operator!= (#1821) master-9254920 0xspringtime 2023-06-13 15:37:54 -0400
  • e32089b2c2
    train : improved training-from-scratch example (#1652) master-e32089b xaedes 2023-06-13 21:04:40 +0200
  • d4b6438708
    ci : re-enable workflows + add README for training Georgi Gerganov 2023-06-13 21:38:00 +0300
  • 6075d7862d
    Merge pull request #23 from anon998/fix-linter-warnings Randall Fitzgerald 2023-06-13 14:32:19 -0400
  • 7a48ade7ef fix comment indentation anon 2023-06-13 14:46:40 -0300
  • c369d11905
    remove 273: Trailing whitespace SuperUserNameMan 2023-06-13 19:36:27 +0200
  • 7df316b728 fix linter warnings + make variables const anon 2023-06-13 14:28:52 -0300
  • 575cf23862 remove json_indent variable anon 2023-06-13 14:21:40 -0300
  • 2347e45e7b
    llama : do a warm-up eval at start for better timings (#1824) master-2347e45 Georgi Gerganov 2023-06-13 20:20:07 +0300
  • 99ef967d42 add static prefix to the other functions too anon 2023-06-13 14:17:22 -0300
  • 1f3945236a remove old verbose variable anon 2023-06-13 14:14:29 -0300
  • bbe9c59618
    Update Makefile for minimalist example SuperUserNameMan 2023-06-13 19:12:45 +0200
  • ba636acb1f
    minimalist example CMakeLists.txt SuperUserNameMan 2023-06-13 19:09:44 +0200
  • 1659d77515
    Create simple.cpp SuperUserNameMan 2023-06-13 19:08:37 +0200
  • cc60183c5f VRAM KV cache based on -ngl, fixed info prints JohannesGaessler 2023-06-13 17:39:32 +0200
  • 15de626b3a double max nodes again Concedo 2023-06-13 23:51:10 +0800
  • 82cf97ce92 hotfix for rwkv Concedo 2023-06-13 23:38:41 +0800
  • e8528d4d2f Fixed incorrect index when going out of context JohannesGaessler 2023-06-13 16:15:11 +0200
  • 20e76a0764 Free CUDA scratch buffer upon llama_model deletion JohannesGaessler 2023-06-13 13:03:25 +0200
  • ed6587491c Free KV cache CUDA buffers upon deletion JohannesGaessler 2023-06-13 11:15:30 +0200
  • 8e3057b24b Removed obsolete code, fixed multi GPU JohannesGaessler 2023-06-12 20:23:16 +0200
  • 95120f1365 flatten rows for ggml_cuda_op JohannesGaessler 2023-06-12 19:52:38 +0200
  • 3b6a2ee414 ggml_cuda_cpy for f32 -> f32 JohannesGaessler 2023-06-12 17:33:48 +0200
  • cf5ae8635a KV cache v works, perf. bad, # after 64 tokens JohannesGaessler 2023-06-12 10:07:27 +0200
  • 9a85d913ee Refactored ggml_cuda_cpy JohannesGaessler 2023-06-12 09:19:11 +0200
  • 19c0bf5c86 ggml_is_permuted JohannesGaessler 2023-06-11 19:53:43 +0200
  • 8d648a34d8 ggml_cuda_diag_mask_inf JohannesGaessler 2023-06-10 22:09:56 +0200
  • 6b46870fea ggml_cuda_scale JohannesGaessler 2023-06-10 20:56:40 +0200