From 60a3e702421c7b6794fd1199a57dec5ca4ee4321 Mon Sep 17 00:00:00 2001
From: oobabooga <112222186+oobabooga@users.noreply.github.com>
Date: Mon, 17 Jul 2023 12:51:01 -0700
Subject: [PATCH] Update LLaMA links and info

---
 docs/GPTQ-models-(4-bit-mode).md | 21 +++++++++++++++++----
 docs/LLaMA-model.md              | 17 ++++++++++++++---
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/docs/GPTQ-models-(4-bit-mode).md b/docs/GPTQ-models-(4-bit-mode).md
index 63a6ed5b..838595ef 100644
--- a/docs/GPTQ-models-(4-bit-mode).md
+++ b/docs/GPTQ-models-(4-bit-mode).md
@@ -142,12 +142,25 @@ python setup_cuda.py install
 
 ### Getting pre-converted LLaMA weights
 
-These are models that you can simply download and place in your `models` folder.
+* Direct download (recommended):
 
-* Converted without `group-size` (better for the 7b model): https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483891617
-* Converted with `group-size` (better from 13b upwards): https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483941105 
+https://huggingface.co/Neko-Institute-of-Science/LLaMA-7B-4bit-128g
 
-⚠️ The tokenizer files in the sources above may be outdated. Make sure to obtain the universal LLaMA tokenizer as described [here](https://github.com/oobabooga/text-generation-webui/blob/main/docs/LLaMA-model.md#option-1-pre-converted-weights).
+https://huggingface.co/Neko-Institute-of-Science/LLaMA-13B-4bit-128g
+
+https://huggingface.co/Neko-Institute-of-Science/LLaMA-30B-4bit-128g
+
+https://huggingface.co/Neko-Institute-of-Science/LLaMA-65B-4bit-128g
+
+These models were converted with `desc_act=True`. They work just fine with ExLlama. For AutoGPTQ, they will only work on Linux with the `triton` option checked.
+
+* Torrent:
+
+https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483891617
+
+https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1483941105
+
+These models were converted with `desc_act=False`. As such, they are less accurate, but they work with AutoGPTQ on Windows. The `128g` versions are better from 13b upwards, and worse for 7b. The tokenizer files in the torrents are outdated, in particular the files called `tokenizer_config.json` and `special_tokens_map.json`. Here you can find those files: https://huggingface.co/oobabooga/llama-tokenizer
 
 ### Starting the web UI:
 
diff --git a/docs/LLaMA-model.md b/docs/LLaMA-model.md
index cd655268..ba7350f5 100644
--- a/docs/LLaMA-model.md
+++ b/docs/LLaMA-model.md
@@ -9,10 +9,21 @@ This guide will cover usage through the official `transformers` implementation.
 
 ### Option 1: pre-converted weights
 
-* Torrent: https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1484235789
-* Direct download: https://huggingface.co/Neko-Institute-of-Science
+* Direct download (recommended):
 
-⚠️ The tokenizers for the Torrent source above and also for many LLaMA fine-tunes available on Hugging Face may be outdated, in particular the files called `tokenizer_config.json` and `special_tokens_map.json`. Here you can find those files: https://huggingface.co/oobabooga/llama-tokenizer
+https://huggingface.co/Neko-Institute-of-Science/LLaMA-7B-HF
+
+https://huggingface.co/Neko-Institute-of-Science/LLaMA-13B-HF
+
+https://huggingface.co/Neko-Institute-of-Science/LLaMA-30B-HF
+
+https://huggingface.co/Neko-Institute-of-Science/LLaMA-65B-HF
+
+* Torrent:
+
+https://github.com/oobabooga/text-generation-webui/pull/530#issuecomment-1484235789
+
+The tokenizer files in the torrent above are outdated, in particular the files called `tokenizer_config.json` and `special_tokens_map.json`. Here you can find those files: https://huggingface.co/oobabooga/llama-tokenizer
 
 ### Option 2: convert the weights yourself