Commit Graph

  • df9135e3a9 fixing memory bugs Concedo 2023-06-23 18:41:23 +0800
  • b8a4594f89 More fixes... niansa 2023-06-23 12:19:33 +0200
  • 9d643755a6 Fixed compile error niansa 2023-06-23 11:51:25 +0200
  • 339bc36cdd Added more functions from Metal niansa 2023-06-23 11:50:30 +0200
  • d7b7484f74
    Add OpenLLaMA instructions to the README (#1954) eiery 2023-06-23 04:38:01 -0400
  • 5d077341f7 Fix ggml-metal.metal path and run nixfmt novafacing 2023-06-23 00:31:36 -0700
  • 6dd5bd7e43
    readme : fixed termux instructions Alberto 2023-06-23 09:15:34 +0200
  • 6c76c31184
    Merge branch 'ggerganov:master' into master WangHaoranRobin 2023-06-22 22:14:41 -0700
  • 7cd8fc20d0
    Merge pull request #4 from WangHaoranRobin/robin_fork_master WangHaoranRobin 2023-06-22 22:00:21 -0700
  • 7b93b248ef server: fix some beginner mistakes Wang Haoran(Robin) 2023-06-22 21:59:12 -0700
  • bdb710efa2
    Merge pull request #3 from WangHaoranRobin/robin_fork_master WangHaoranRobin 2023-06-22 21:36:50 -0700
  • cf76195223 server: fix issue when handling probability output for incomplete tokens for multibyte character generation Wang Haoran(Robin) 2023-06-22 21:35:37 -0700
  • 3349c01357
    3b works now eiery 2023-06-22 22:51:52 -0400
  • e92795f2f4 Add CUDA and hopefully Metal support for p_scale KerfuffleV2 2023-06-22 14:06:13 -0600
  • df7346ccd5
    Merge 'origin/master' into hipblas Henri Vasserman 2023-06-22 20:51:09 +0300
  • 926664c229
    Merge pull request #2 from WangHaoranRobin/robin_fork_master WangHaoranRobin 2023-06-22 09:01:42 -0700
  • ccf254bd44 server: fix comment about max n_probs Wang Haoran(Robin) 2023-06-22 08:57:35 -0700
  • 9cdaea9240 Implemented dequantize_row_q4_1 niansa 2023-06-22 16:30:36 +0200
  • 887694acfd Handle rope params in CUDA, Metal KerfuffleV2 2023-06-22 08:18:01 -0600
  • b0f11fa9c1 More code cleanups niansa 2023-06-22 16:05:56 +0200
  • 4bf45a7dbe Helps to pass args in the correct order KerfuffleV2 2023-06-22 06:37:21 -0600
  • 7487137227
    rework convert.py to read hyper-parameters from config.json (#1958) master-7487137 Erik Scholz 2023-06-22 14:20:47 +0200
  • 3b3d30e4ad Cleanups niansa 2023-06-22 13:55:25 +0200
  • 2f3fe0c0a4 Updated gitignore niansa 2023-06-22 12:58:33 +0200
  • 4f598dd973 Initial working stuff niansa 2023-06-22 12:58:07 +0200
  • bc17e11590 Allow specifying p scale factor for ggml rope and rope_back ops KerfuffleV2 2023-06-22 05:29:11 -0600
  • 0eedccaf06 Merge branch 'master' into optimize_quants_upstream Concedo 2023-06-22 17:59:58 +0800
  • e6ddb15c3a cleanup Concedo 2023-06-22 10:38:27 +0800
  • bbca06e269
    cmake: revert CUDA arch default to 52, 61 if f16 (#1959) master-bbca06e Johannes Gäßler 2023-06-21 23:49:25 +0200
  • fb98254f99
    Fix typo in README.md (#1961) Rahul Vivek Nair 2023-06-22 03:18:43 +0530
  • 8004e673f0
    Merge pull request #1 from WangHaoranRobin/robin_fork_master WangHaoranRobin 2023-06-21 14:28:46 -0700
  • ba210e4bc7 server: add option to output probabilities for completion Wang Haoran(Robin) 2023-06-21 14:21:35 -0700
  • c7dc5a37c3
    remove 3b eiery 2023-06-21 16:28:56 -0400
  • dbf02472bd cmake: revert CUDA arch default to 52, 61 if f16 JohannesGaessler 2023-06-21 15:28:55 +0200
  • 022d099376 Fix typo in README.md RahulVivekNair 2023-06-21 23:54:35 +0530
  • 0141e6395c
    clean up previous hack Green Sky 2023-06-21 19:52:40 +0200
  • 1b71752a9f Implemented basic GPU offloading for MPT, GPT-2, GPT-J and GPT-NeoX Concedo 2023-06-22 00:43:25 +0800
  • b1f00fa9cc
    Fix hordeconfig max context setting, and add Makefile flags for cuda F16/KQuants per iter. (#252) Ycros 2023-06-22 01:01:46 +1000
  • 72397fbe63
    read params from hftransformer config.json Green Sky 2023-06-21 15:12:09 +0200
  • dfdd20240c gpt j use scratch buffers Concedo 2023-06-21 16:10:31 +0800
  • 2880f43b7f add test for correct top-p behavior Alex Renda 2023-06-20 20:49:43 -0400
  • 407b77cdb3 top-p: correct gt to gte Alex Renda 2023-06-20 20:48:09 -0400
  • d7714a8f80 Deprecate public API function llama_apply_lora_from_file Didzis Gosko 2023-06-21 00:08:49 +0300
  • 69f776282b Update public API use cases: move away from deprecated llama_init_from_file Didzis Gosko 2023-06-20 23:47:33 +0300
  • 6ef282f2b8
    spacing eiery 2023-06-20 16:16:05 -0400
  • aa4df44134
    whitespace eiery 2023-06-20 16:14:11 -0400
  • ff24bd7667
    table of contents eiery 2023-06-20 15:47:05 -0400
  • f5a276b265
    add openllama to readme eiery 2023-06-20 15:41:30 -0400
  • 1e7755cfcb Fix top-p sampling to match the standard definition (smallest set that has probability mass at least p, not largest set with probability mass less than p) Alex Renda 2023-06-20 14:38:13 -0400
  • a9eb1e73e9 Fix typo Xiake Sun 2023-06-20 12:47:22 -0400
  • 049aa16b8c
    readme : add link to p1 Georgi Gerganov 2023-06-20 19:05:54 +0300
  • 0dcfe45c1c Fix crash when running train with CUDA enabled Howard Su 2023-06-20 23:58:37 +0800
  • 53dfbbf553 add example of PandaGPT ningshanwutuobang 2023-06-20 22:57:21 +0800
  • 266d47a4b9 Merge branch 'optimize_quants_upstream' into concedo_experimental Concedo 2023-06-20 22:46:35 +0800
  • da668e685f fixing address spaces Concedo 2023-06-20 22:45:16 +0800
  • cce6e67f44 fixing address spaces Concedo 2023-06-20 22:45:16 +0800
  • 1f1735f5ad Merge branch 'optimize_quants_upstream' into concedo_experimental Concedo 2023-06-20 21:39:35 +0800
  • 6b75fc48b9 fixed global const struct types Concedo 2023-06-20 21:38:48 +0800
  • 2322ec223a
    Fix typo (#1949) Xiake Sun 2023-06-20 05:42:40 -0700
  • 537ff22ec9 fixed a bug with token timings, updated lite Concedo 2023-06-20 20:41:42 +0800
  • c5ae3f50a7 Merge branch 'optimize_quants_upstream' into concedo_experimental Concedo 2023-06-20 18:41:13 +0800
  • a6e8b0216d remove old dot kernels and template Concedo 2023-06-20 18:34:46 +0800
  • 93247a11cd ported q2k and q5k speedups Concedo 2023-06-20 18:30:30 +0800
  • 029bed6446 ported q3k speedup successfully Concedo 2023-06-20 17:57:44 +0800
  • d754915269 Merge branch 'optimize_quants_upstream' into concedo_experimental Concedo 2023-06-20 17:26:39 +0800
  • b4c532e862 Merge branch 'master' into concedo_experimental Concedo 2023-06-20 17:26:27 +0800
  • 8d816d19d1 Add q6_k fast matmul kernel 0cc4m 2023-06-20 08:41:35 +0200
  • 34a4917984 Use preprocessor for QK_K 0cc4m 2023-06-20 08:04:16 +0200
  • 069cbe530d Fix q2_k fast kernel 0cc4m 2023-06-20 08:01:40 +0200
  • 7accbe11d3
    Update README.md Howard Su 2023-06-20 10:03:02 +0800
  • c09786bac5 test John 2023-06-20 04:01:09 +0200
  • dd80fb5320 chunked RMS and mulmat for testing John 2023-06-20 04:00:34 +0200
  • aacdbd4056
    llama : fix params struct slignment (#1936) master-aacdbd4 Ettore Di Giacinto 2023-06-20 03:24:39 +0200
  • e1c7e9d7d0 Fix CUDA build on Windows Howard Su 2023-06-20 07:38:31 +0800
  • a58436489f display bug fixed, warning added on n_batch usage John 2023-06-20 01:11:51 +0200
  • 5dd2fbe6ea
    Merge 'origin/master' into hipblas Henri Vasserman 2023-06-20 01:23:12 +0300
  • 20568fe60f
    [Fix] Reenable server embedding endpoint (#1937) master-20568fe Henri Vasserman 2023-06-20 01:12:39 +0300
  • acbc840244 Merge branch 'master' of https://github.com/cmp-nct/ggllm.cpp John 2023-06-19 23:43:07 +0200
  • 695c15e174 Bugfix for --ngl on low vram not working correctly John 2023-06-19 23:41:59 +0200
  • 32141a3a75 added the QK reject message into the quantizer John 2023-06-19 23:05:16 +0200
  • 7bba46ba62 Add deprecated warning for public API function llama_init_from_file Didzis Gosko 2023-06-19 23:44:14 +0300
  • 793bcc0b94 Fix style Didzis Gosko 2023-06-19 23:43:40 +0300
  • 18b35625c3
    ggml : fix bug in LBFGS optimizer (found by ggml tests) master-18b3562 Georgi Gerganov 2023-06-19 20:43:30 +0300
  • 7de2494a0f Add comment mudler 2023-06-19 18:43:27 +0200
  • 7a45a13e3d Move booleans at the bottom of the structure mudler 2023-06-19 18:42:36 +0200
  • 67ba34e88f
    ggml : minor style + try fix sanitizer build Georgi Gerganov 2023-06-19 18:55:09 +0300
  • 1ca2186189
    Update README.md John 2023-06-19 17:53:35 +0200
  • d0e3596350
    ggml : minor style changes Georgi Gerganov 2023-06-19 18:45:36 +0300
  • 69fd31d18c Merge branch 'master' into optimize_quants_upstream Concedo 2023-06-19 23:38:59 +0800
  • 5e8e99f206 Merge branch 'master' into concedo_experimental Concedo 2023-06-19 23:37:53 +0800
  • 90a0e65c67
    Merge branch 'master' into HEAD Georgi Gerganov 2023-06-19 18:35:49 +0300
  • ba4e85a833
    llama : use aligned memory during ggml_init call from loading saved sessions (#1934) master-ba4e85a l3utterfly 2023-06-19 23:20:06 +0800
  • 23fc5c219a
    cmake : fix trailing whitespaces master-23fc5c2 Georgi Gerganov 2023-06-19 18:18:34 +0300
  • cb40dfca69
    llama : only use Q6_K for output weights if tensor size is multiple of 256 (#1932) master-cb40dfc Kawrakow 2023-06-19 18:17:03 +0300
  • ca7c3f4da5
    cuda : faster k-quants on older GPUs (#1930) master-ca7c3f4 Kawrakow 2023-06-19 18:14:09 +0300
  • b97ca431db
    ggml : sync latest ggml repo (#1924) master-b97ca43 Georgi Gerganov 2023-06-19 18:12:33 +0300
  • 1e3abfcef0
    cmake : fix build shared ggml when CUDA is enabled (#1929) master-1e3abfc Howard Su 2023-06-19 23:10:37 +0800
  • c27f708127
    Merge branch 'master' into fix_build Georgi Gerganov 2023-06-19 18:10:24 +0300
  • c94a438328 xx + ib0 Concedo 2023-06-19 23:01:49 +0800
  • 266d436746 Added broken new q4k quant Concedo 2023-06-19 22:20:19 +0800