Commit Graph

  • 0b08ec7c5d forgot to remove this Concedo 2023-04-20 16:28:47 +0800
  • 346cd68903 make linux and OSX build process equal to windows. Now it will build all applicable libraries, for a full build do make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 Concedo 2023-04-20 15:53:55 +0800
  • c8c2c52482
    AVX2 optimization for vec_dot_q4_2_q8_0 (#1068) master-c8c2c52 Stephan Walter 2023-04-20 06:45:41 +0000
  • ce05fc0a67 Multi-threading for quantize-stats Iwan Kawrakow 2023-04-20 07:25:13 +0200
  • 2732a6b84a Merge remote-tracking branch 'upstream/master' into eval-thread-count ml6 2023-04-19 21:43:40 -0700
  • 93761e7baf slightly clarified the library replacement steps - replacing the dll is necessary in addition to replacing the library imports Concedo 2023-04-20 12:23:54 +0800
  • 5ca2d774cc
    doc - explanation of how to use a custom version of the windows libraries at the lib folder. (#92) Gustavo Rocha Dias 2023-04-20 01:20:11 -0300
  • e488db9fd9
    Remove the LLAMA_ACCELERATE matrix dimension from Ubuntu builds in the CI Ivan Komarov 2023-04-20 04:23:19 +0200
  • 02d6988121
    Improve cuBLAS performance by dequantizing on the GPU (#1065) master-02d6988 slaren 2023-04-20 03:14:14 +0200
  • 18337719e0 Fix windows build Slaren 2023-04-20 01:03:44 +0200
  • 95cf9597aa Fix possible synchronization issue Slaren 2023-04-19 23:01:53 +0200
  • 834695fe3a
    Minor: Readme fixed grammar, spelling, and misc updates (#1071) CRD716 2023-04-19 14:52:14 -0500
  • 5b7ff8234f
    editorconfig check CRD716 2023-04-19 14:35:58 -0500
  • 48f6664589
    trailing CRD716 2023-04-19 14:31:01 -0500
  • 0731d4147e
    Update README.md CRD716 2023-04-19 14:29:40 -0500
  • 72028641ca AVX2 optimization for vec_dot_q4_2_q8_0 Stephan Walter 2023-04-19 20:41:55 +0200
  • d2f9266200 Multi-threading quantization. Iwan Kawrakow 2023-04-19 20:20:44 +0200
  • f7d05095b4
    Q4_2 quantization with rmse-optimized scale and quants (#1062) master-f7d0509 Kawrakow 2023-04-19 20:20:14 +0200
  • fe14e7c522 Re-add dropped Darwin-only flag. Corbin 2023-04-19 10:53:42 -0700
  • 35b0bf0585 Merge remote-tracking branch 'upstream/master' into more_responsive Jeffersoncgo 2023-04-19 13:44:25 -0400
  • 14a4fc874b Nix flake: Use Makefile instead of CMake Corbin 2023-04-19 10:23:47 -0700
  • 884e7d7a2b
    ggml : use 8-bit precision for Q4_1 intermediate results (#1047) master-884e7d7 Georgi Gerganov 2023-04-19 20:10:08 +0300
  • 96d84438bc Fixed type as per reviewer comment Iwan Kawrakow 2023-04-19 18:57:07 +0200
  • 49beb2cdb8 Better follow ggml conventions for function names Iwan Kawrakow 2023-04-19 18:46:44 +0200
  • e582f2ad60
    gitignore : ignore ppl-*.txt files Georgi Gerganov 2023-04-19 19:31:44 +0300
  • ad7007aa21
    ggml : AVX2 implementation of ggml_vec_dot_q4_1_q8_0 (#1051) slaren 2023-04-19 18:29:02 +0200
  • 426230525c
    ggml : optimize ggml_vec_dot_q4_1_q8_0() via vmalq_n_f32 Georgi Gerganov 2023-04-18 23:33:03 +0300
  • e9c07f72cb
    ggml : use 8-bit precision for Q4_1 intermediate results (ARM) Georgi Gerganov 2023-04-18 22:12:19 +0300
  • 6d36a51fa5
    ggml : satisfy the sanitizer builds Georgi Gerganov 2023-04-19 19:18:28 +0300
  • 891af05e7d Remove unused parameters Slaren 2023-04-19 18:11:54 +0200
  • 7cd5c4a3e9
    readme : add warning about Q4_2 and Q4_3 Georgi Gerganov 2023-04-19 19:07:54 +0300
  • f3d4edf504
    ggml : Q4 cleanup - remove 4-bit dot product code (#1061) master-f3d4edf Stephan Walter 2023-04-19 16:06:37 +0000
  • 359b056034 Improve cuBLAS performance with quantized models by dequantizing on the GPU Slaren 2023-04-19 18:01:39 +0200
  • e9657b20e8 Remove unused AVX512 Q4_0 code Stephan Walter 2023-04-19 17:31:02 +0200
  • 6eec06081b Q4_2 quantization with rmse-optimized scale and quants Iwan Kawrakow 2023-04-19 17:10:58 +0200
  • 21ee6d97cc Q4 cleanup Stephan Walter 2023-04-19 16:15:24 +0200
  • 275f1bdf13 Added tokens to identify if is loading or ready Jeffersoncgo 2023-04-19 09:08:32 -0400
  • be1222c36e Merged the upstream cublas feature, Concedo 2023-04-19 20:45:37 +0800
  • cc407f283a messing around with memory allocation to bandaid the random ooms with various gpt2 and gptj models Concedo 2023-04-19 20:18:55 +0800
  • 99eafe908f more_responsive Jeffersoncgo 2023-04-19 08:01:35 -0400
  • 8944a13296
    Add NVIDIA cuBLAS support (#1044) master-8944a13 slaren 2023-04-19 11:22:45 +0200
  • f662a9a230 Merge branch 'master' into concedo Concedo 2023-04-19 16:34:51 +0800
  • 65bfcdb1cc Merge branch 'concedo_experimental' into concedo Concedo 2023-04-19 15:35:48 +0800
  • 45ec09d31b fast forwarding for rwkv for unmodified contexts Concedo 2023-04-19 15:09:35 +0800
  • 116488af66
    Create make_pyinstaller.sh (#89) AlpinDale 2023-04-19 07:27:07 +0430
  • 142c38a4f3 AVX2 implementation of ggml_vec_dot_q4_1_q8_0 Slaren 2023-04-19 03:13:20 +0200
  • 6667401238
    Multi-threaded ggml_cpy (#1035) master-6667401 slaren 2023-04-19 00:53:24 +0200
  • b9e99cd1fd Also fix wdata offset in ggml_compute_forward_add_q_f32 Slaren 2023-04-18 22:27:50 +0200
  • 8bd47a8bda Update ggml.c slaren 2023-04-18 19:19:01 +0200
  • 0f8b1df18f Multi-threaded ggml_cpy Slaren 2023-04-18 01:47:34 +0200
  • 40846bd28d Cleanup cublas comments Slaren 2023-04-19 00:37:33 +0200
  • 5fc6799f05 Add support to cmake Slaren 2023-04-18 23:20:11 +0200
  • 77a73403ca
    ggml : add new Q4_2 quantization (ARM only) (#1046) master-77a7340 Georgi Gerganov 2023-04-18 23:54:57 +0300
  • ed24225917
    ggml : optimize ggml_vec_dot_q4_1_q8_0() via vmalq_n_f32 Georgi Gerganov 2023-04-18 23:33:03 +0300
  • 3ceb0733a6
    Merge branch 'master' into q4_1xq8_0 Georgi Gerganov 2023-04-18 23:13:21 +0300
  • 50a8a2af97
    ggml : scratch that - vmlaq_n_f32 is always better master-50a8a2a Georgi Gerganov 2023-04-18 23:11:23 +0300
  • 5843b45b6b
    ggml : optimize q4_2 using vmlaq_n_f32 + vmulq_n_f32 Georgi Gerganov 2023-04-18 23:09:18 +0300
  • 3a7908940f
    ggml : speed-up q4_2 Georgi Gerganov 2023-04-18 22:39:35 +0300
  • 5e6b62ce77
    llama : update llama_type_name() with Q4_2 entry Georgi Gerganov 2023-04-18 22:21:20 +0300
  • fe859297f3
    ggml : add ggml_is_quantized() Georgi Gerganov 2023-04-18 21:32:27 +0300
  • e435b81454
    ggml : Q4_2 ARM Georgi Gerganov 2023-04-18 21:11:56 +0300
  • 4caebf6d40
    gitignore : vdot Georgi Gerganov 2023-04-18 23:00:08 +0300
  • dcdd65e296
    ggml : optimize ggml_vec_dot_q4_0_q8_0() using vectorized accumulators master-dcdd65e Georgi Gerganov 2023-04-18 22:59:17 +0300
  • 7840f6637c
    ggml : use 8-bit precision for Q4_1 intermediate results (ARM) Georgi Gerganov 2023-04-18 22:12:19 +0300
  • 5ecff35151
    Adding a simple program to measure speed of dot products (#1041) master-5ecff35 Kawrakow 2023-04-18 21:00:14 +0200
  • e8061e6990
    Merge 3dc5243b1b into 7faa7460f0 Jan Bielak 2023-04-18 20:35:14 +0200
  • 5725eec429
    Update CMakeLists.txt 源文雨 2023-04-19 01:10:49 +0800
  • 7faa7460f0
    readme : update hot topics about new LoRA functionality Georgi Gerganov 2023-04-18 20:10:26 +0300
  • 5af8e32238
    ci : do not run on drafts master-5af8e32 Georgi Gerganov 2023-04-17 18:00:10 +0300
  • d7c53a084e
    Update CMakeLists.txt 源文雨 2023-04-19 00:45:50 +0800
  • fdb55c9a01
    Update CMakeLists.txt 源文雨 2023-04-19 00:45:29 +0800
  • 72cd433066
    ggml : test dot product q4_0 x f32 Georgi Gerganov 2023-04-18 19:20:37 +0300
  • 4440d198c0 Add NVIDIA cuBLAS support Slaren 2023-04-18 18:16:27 +0200
  • d4dd743d6f
    fix: ld link test-tokenizer-0 error 源文雨 2023-04-19 00:13:59 +0800
  • baee7684df Adding a POC dot product for Q4_1 quantization Iwan Kawrakow 2023-04-18 17:24:45 +0200
  • 42031dac73 Adding a simple program to measure speed of dot products Iwan Kawrakow 2023-04-18 16:14:01 +0200
  • f39def81d4 Update readme with more info Concedo 2023-04-18 21:44:26 +0800
  • 3614956bc7 update readme Concedo 2023-04-18 21:39:05 +0800
  • ea01771dd5 rwkv is done Concedo 2023-04-18 20:55:01 +0800
  • 391c5a247a remove redundant free memory call. wbpxre150 2023-04-18 19:11:59 +0800
  • a76b15b581 Merge branch 'concedo' into concedo_experimental Concedo 2023-04-18 17:42:43 +0800
  • ed5b5c45a9
    doc - enhanced readme explaing how to compile at Windows. (#80) Gustavo Rocha Dias 2023-04-18 06:40:04 -0300
  • a9253cdfba
    fix - at some OSs the PyInstaller command is case sensitive, at lowercase it doen't work. (#81) Gustavo Rocha Dias 2023-04-18 06:39:06 -0300
  • ac61e34d5f Merge branch 'master' into concedo_experimental Concedo 2023-04-18 17:38:10 +0800
  • c200b674f4 updated kobold lite, work on rwkv, added exe path to model load params, added launch parameter Concedo 2023-04-18 17:36:44 +0800
  • 42747220b4
    Do not close file after mmap (Windows version) (#1034) master-4274722 Ivan Komarov 2023-04-18 03:15:50 +0200
  • bf1a24aceb Do not close file after mmap (Windows version) Ivan Komarov 2023-04-18 02:19:43 +0200
  • d1f02102f8
    examples : evaluate tokens in batches after swapping context grencez 2023-04-17 13:01:17 -0700
  • e9298af389
    readme : add Ruby bindings (#1029) Atsushi Tatsuma 2023-04-18 04:34:35 +0900
  • d1b51ceb8e
    ggml : minor Georgi Gerganov 2023-04-17 21:59:58 +0300
  • 4ad73137a1
    add 4_0 to default outfile namestr dict (#1031) Cameron 2023-04-17 11:26:23 -0700
  • 2299b0a5f5 Use VLAs instead of alloca, if possible. Olaf Seibert 2023-04-12 20:37:41 +0200
  • 7905368223 --amend Cammy 2023-04-17 11:12:25 -0700
  • 74e28db50f add 4_0 to default outfile namestr dict Cammy 2023-04-17 09:11:14 -0700
  • a8592fbc11
    ggml : explicit assignment of deltas Georgi Gerganov 2023-04-17 18:41:14 +0300
  • 315a95a4d3
    Add LoRA support (#820) master-315a95a slaren 2023-04-17 17:28:55 +0200
  • 9d0a3a71b0
    readme : add Ruby bindings yoshoku 2023-04-18 00:15:20 +0900
  • 0f5ee9e13f
    ci : do not run on drafts Georgi Gerganov 2023-04-17 18:00:10 +0300
  • 48ab0963ae No need to copy tokens Howard Su 2023-04-17 23:07:30 +0800
  • 7da998da01
    ggml : initial ARM_NEON 2x F16 Q4_0 implementation Georgi Gerganov 2023-04-17 16:17:06 +0300