Commit Graph

  • a316a425d0
    Overhaul the examples structure master-a316a42 Georgi Gerganov 2023-03-25 20:26:40 +0200
  • 70ff2062df Add AVX2 implementation of dequantize_row_q4_1 Slaren 2023-03-25 19:02:07 +0100
  • ecbe466a36
    Retire the ggml_mul_mat() branch for transposed src0 (#500) master-ecbe466 Georgi Gerganov 2023-03-25 19:47:21 +0200
  • face8082ea
    SIMD-ify dequantize_row_q4_0() for ARM_NEON (#502) Georgi Gerganov 2023-03-25 19:31:53 +0200
  • b83ddbd768
    Fix dequantization - forgot to interleave the quants Georgi Gerganov 2023-03-25 19:31:23 +0200
  • c2916bb4a0
    disable avx512 test on runner for now anzz1 2023-03-25 19:09:34 +0200
  • a52f6b43b7
    CI: option 2 anzz1 2023-03-25 18:54:28 +0200
  • 04be5b0ba4
    Attempt to SIMD-ify dequantize_row_q4_0() for ARM_NEON Georgi Gerganov 2023-03-25 18:40:13 +0200
  • d234a8643f
    Merge branch 'ggerganov:master' into patch-1 RSereno 2023-03-25 16:01:53 +0000
  • 8daa71d958
    Update tools.sh RSereno 2023-03-25 16:01:11 +0000
  • 1e39d2bf77
    Retire the ggml_mul_mat() for transposed src0 Georgi Gerganov 2023-03-25 17:55:31 +0200
  • 2279cd25f7
    Enable avx for Linux only if also fp16c available. R.Kaufmann 2023-03-25 16:54:42 +0100
  • ea546b5f8d with logits_all == true, seek to the last logits vector Maël Kerbiriou 2023-03-25 14:58:57 +0100
  • 502a400192
    Disable prompt verbosity by default and add option to enable (#480) master-502a400 Georgi Gerganov 2023-03-25 17:16:50 +0200
  • 09aecbf628
    Add AVX2 implementation of dequantize_row_q4_0 (#467) master-09aecbf slaren 2023-03-25 16:06:49 +0100
  • 4640eff23d
    Don't interefe with BLAS for large prompts by running only 1 thread master-4640eff Georgi Gerganov 2023-03-25 17:03:10 +0200
  • ab77d76312
    Add longer DAN prompt for testing big batch numbers Georgi Gerganov 2023-03-25 16:47:59 +0200
  • 29b7baab67
    Add timings for the prompt evaluation (#478) master-29b7baa slaren 2023-03-25 15:34:23 +0100
  • 4a7129acd2
    Remove obsolete information from README Georgi Gerganov 2023-03-25 16:30:32 +0200
  • 43e1cf8693
    CI: (Windows) Add AVX / AVX512 builds anzz1 2023-03-25 16:30:25 +0200
  • d213f9d52b
    CMake: add AVX512 option anzz1 2023-03-25 16:28:09 +0200
  • 6b6dbc8910
    Remove obsolete assert and fix compiler warning master-6b6dbc8 Georgi Gerganov 2023-03-25 16:22:05 +0200
  • 2a2e63ce05
    Fix nasty bug in ggml_compute_forward_mul_mat_f32() and reenable BLAS master-2a2e63c Georgi Gerganov 2023-03-25 16:09:54 +0200
  • 4eae17153c
    Merge branch 'ggerganov:master' into patch-1 RSereno 2023-03-25 13:59:12 +0000
  • e899bf54b2
    bounds checking for input prefix (#492) master-e899bf5 anzz1 2023-03-25 14:42:09 +0200
  • d37af8d7c0
    bounds checking for input prefix anzz1 2023-03-25 14:07:23 +0200
  • fbd4d38c64
    feat: '--in-prefix STRING' option (#426) master-fbd4d38 anzz1 2023-03-25 14:03:19 +0200
  • 58e6c9f36f
    Add support for file load progress reporting callbacks (#434) master-58e6c9f Jed Fox 2023-03-25 01:26:28 -0400
  • 36d07532ef
    Add missing struct annotation (#483) master-36d0753 Doomsdayrs 2023-03-25 01:21:24 -0400
  • 6f1ee4b640
    Fix crash for 65B model with pre-allocated memory (#485) master-6f1ee4b Chris Kuehl 2023-03-24 23:38:14 -0500
  • 8a339bd75c update gitignore Concedo 2023-03-25 11:23:40 +0800
  • 3c78124aac Merge branch 'master' into concedo Concedo 2023-03-25 11:20:04 +0800
  • 119392f6f2 defaulting to f32 kv, and 4 threads seem to produce better results Concedo 2023-03-25 11:11:40 +0800
  • 506cd62638 changed some defaults to hopefully increase compatibility Concedo 2023-03-25 10:40:11 +0800
  • b13a768813 added softprompt endpoint Concedo 2023-03-25 10:12:47 +0800
  • 8347bede58
    Add missing struct annotation Doomsdayrs 2023-03-24 20:25:50 -0400
  • 5d909a377c
    enable sanitizers in linux ci Green Sky 2023-03-25 01:16:14 +0100
  • fe5af95ef8
    cmake: make sanitizers link Green Sky 2023-03-23 21:46:04 +0100
  • 743ec9b221 Fix crash for 65B model with pre-allocated memory Chris Kuehl 2023-03-24 19:17:34 -0500
  • 186ecfd8a4 Remove printing of prompt and prompt tokenization at startup Slaren 2023-03-24 23:46:02 +0100
  • 8666c5aa43 Add timings for the prompt evaluation Slaren 2023-03-24 23:12:15 +0100
  • 8520fc310e
    Disable BLAS altogether - the bug is not just for qunatized mat mul master-8520fc3 Georgi Gerganov 2023-03-24 23:47:06 +0200
  • b3f460e941
    Disable BLAS branch in mul_mat - seems there is a bug master-b3f460e Georgi Gerganov 2023-03-24 23:39:17 +0200
  • 3a8e8b7a0f
    Fix typo Jed Fox 2023-03-24 17:28:34 -0400
  • a8096f3d81
    Merge branch 'master' into jed/spm Jed Fox 2023-03-24 17:27:46 -0400
  • ae3d0ff68f
    Call progress callback more frequently Jed Fox 2023-03-24 17:26:19 -0400
  • 1e3fd898a3
    Merge branch 'master' into jed/load-progress Jed Fox 2023-03-24 17:25:38 -0400
  • 04c6f5ed6f
    Immediately start processing the prompt before user input has been provided (#476) master-7a9b6c3 master-04c6f5e Georgi Gerganov 2023-03-24 23:17:58 +0200
  • 7a9b6c3a8b
    Reduce memory usage and allocate enough memory for largest context (#473) Georgi Gerganov 2023-03-24 23:17:37 +0200
  • 6feb572b36
    Merge branch 'master' into mem-fix Georgi Gerganov 2023-03-24 23:17:19 +0200
  • d0f7519338
    Fix KV cache size for F32 Georgi Gerganov 2023-03-24 22:58:00 +0200
  • d26a3994f4
    Immediately start processing the prompt before user input has been provided Georgi Gerganov 2023-03-24 22:44:02 +0200
  • 4aeee216fd Regroup q4_1 dot addition for better numerics. q4_1_more_accel Matvey Soloviev 2023-03-23 04:56:21 +0100
  • 580991bbed Squeeze out about 5% more performance in Q4_1 inference Matvey Soloviev 2023-03-21 22:55:35 +0100
  • 0b4e849a24
    Fix number of layers in 30B and 65B Georgi Gerganov 2023-03-24 22:15:06 +0200
  • 3634c312bc
    Reenable BLAS for quantized mul_mat Georgi Gerganov 2023-03-24 22:03:56 +0200
  • ea60d2193a
    Simpler scratch buffer usage Georgi Gerganov 2023-03-24 21:41:47 +0200
  • 9330ff0f35
    Reduce memory usage and allocate enough memory for large contexts Georgi Gerganov 2023-03-24 18:22:48 +0200
  • 8f2b6d222d Add AVX2 implementation of dequantize_row_q4_0 Slaren 2023-03-24 17:13:50 +0100
  • 31572d9665
    Temporary bump the memory buffer size - hopefully fix issues from 483bab2e master-31572d9 Georgi Gerganov 2023-03-24 18:23:56 +0200
  • f4f5362edb
    Update README.md (#444) master-863f65e Gary Mulder 2023-03-24 15:23:09 +0000
  • 863f65e2e3
    fix instruct mode (#445) rabidcopy 2023-03-24 10:22:39 -0500
  • afd220d9c6
    Properly free llama_context on failure master-afd220d master-563cdc3 master-481044d Georgi Gerganov 2023-03-24 17:21:01 +0200
  • 481044d50c
    additional optimizations for POWER9 (#454) Cameron Kaiser 2023-03-24 08:19:26 -0700
  • 563cdc391d
    Support calling mlock() on loaded model data on Linux and macOS (#453) comex 2023-03-24 08:19:05 -0700
  • 53a941c1e5
    Update llama.cpp Georgi Gerganov 2023-03-24 17:17:56 +0200
  • a65f23342d
    Merge branch 'master' into mlock Georgi Gerganov 2023-03-24 17:15:24 +0200
  • 8d4a855c24
    Add embedding mode with arg flag. Currently working (#282) master-8d4a855 Luciano 2023-03-24 08:05:13 -0700
  • 8e383f1895 gitignore Concedo 2023-03-24 23:02:25 +0800
  • 1c78ffb964
    Update README.md LostRuins 2023-03-24 22:45:54 +0800
  • e791827973 added a GUI for selection of models if none was passed in through command line. Concedo 2023-03-24 22:03:57 +0800
  • c6c60332a4 Optimizations Concedo 2023-03-24 21:33:53 +0800
  • 3879d84400 Merge branch 'master' into concedo Concedo 2023-03-24 19:28:27 +0800
  • 706e19e9b4 added ability to fast forward in time through partially duplicated prompts Concedo 2023-03-24 18:50:16 +0800
  • 8b4b1e1fb3
    Merge branch 'ggerganov:master' into fix-instruct rabidcopy 2023-03-24 03:09:53 -0500
  • b6b268d441
    Add link to Roadmap discussion Georgi Gerganov 2023-03-24 09:13:35 +0200
  • 3cd8dde0d1 Revert "Fix memory allocation issues and seg faults" master-3cd8dde Georgi Gerganov 2023-03-24 06:22:28 +0200
  • a34ba06b38
    Prevent users from using the instruct mode and interactive mode at the same time. mmyjona 2023-03-24 12:19:37 +0800
  • 2a6daccc40 additional optimizations for POWER9 Cameron Kaiser 2023-03-23 20:23:45 -0700
  • 34e8e4feef Support calling mlock() on loaded model data on Linux and macOS comex 2023-03-23 20:08:13 -0700
  • 57dc4dc68a Revert "Fix memory allocation issues and seg faults" Gary Linscott 2023-03-23 18:44:48 -0700
  • acc36eb0b5 Add AVX2 implementation of ggml_compute_forward_rms_norm_f32 Slaren 2023-03-24 01:10:46 +0100
  • 9179d089a2 Merge remote-tracking branch 'origin/master' into batch_perplexity Gary Linscott 2023-03-23 18:35:22 -0700
  • 6041736d6b
    Update README.md Kevin Kwok 2023-03-23 16:00:10 -0700
  • b64067704e
    fix instruct mode rabidcopy 2023-03-23 17:56:16 -0500
  • 3e481d05f0
    Update README.md Kevin Kwok 2023-03-23 15:51:16 -0700
  • f7de57fd3a
    Update README.md Gary Mulder 2023-03-23 22:29:52 +0000
  • 4870e455b3
    Fix memory allocation issues and seg faults master-4870e45 Georgi Gerganov 2023-03-24 00:11:53 +0200
  • 483bab2e3d
    Avoid the transposed X branch in the Z = X * Y matrix multiplication (#439) master-483bab2 Georgi Gerganov 2023-03-23 23:22:01 +0200
  • 5dd94f70b2
    cmake: make sanitizers link Green Sky 2023-03-23 21:46:04 +0100
  • 2d262ea9f0
    fix perplexity - it's memory needs dont grow, so we skip it Green Sky 2023-03-23 20:50:09 +0100
  • 404e1da38e
    Fix quantize script not finding models in parent directory (#428) Jed Fox 2023-03-23 16:42:52 -0400
  • 4cc053b6d5
    Remove oboslete command from Docker script Georgi Gerganov 2023-03-23 22:39:44 +0200
  • 0ba5a3a9a5
    Obsolete Georgi Gerganov 2023-03-23 22:32:02 +0200
  • d782609307
    Delete download-pth.py Jed Fox 2023-03-23 16:31:49 -0400
  • 2e17dfd80a
    Replace EOS with newline to prevent context/memory being flushed by EOS in interactive mode (#333) master-2e17dfd rabidcopy 2023-03-23 15:22:47 -0500
  • 4a4718e8ab
    More correct load progress Jed Fox 2023-03-23 16:18:37 -0400
  • 23035f9ba8
    Use seekg to find file size instead Jed Fox 2023-03-23 16:18:29 -0400
  • 20a1a4e09c
    Fix GPTQ converter (#423) master-ad072fc Timmy Knight 2023-03-23 10:18:13 -1000
  • ad072fc5ad
    Generate library with CMake (#430) nusu-github 2023-03-24 05:16:48 +0900