lxe · March 12, 2025 13:55 · Mar 14, 2023 · Mar 13, 2023 · Mar 13, 2023 · Mar 13, 2023
diff --git a/README.md b/README.md
@@ -24,15 +24,15 @@ This guide actually works well for linux too. Just don't bother with the powersh
     conda install python=3.10
     ```
     
+    Installing pytorch and cuda is the hardest part of machine learning
+    I've come up with this install line from the following sources:
+
+     - https://pytorch.org/get-started/locally/#start-locally
+     - https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#installing-previous-cuda-releases
     
     ```bash
-    conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
-    ```
-    
-    Old / alternative method
-    ```
-    # conda install cuda -c nvidia/label/cuda-11.7.0
-    # pip install torch --extra-index-url https://download.pytorch.org/whl/cu117
+    conda install cuda pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia/label/cuda-11.7.0
+    python -c 'import torch; print(torch.cuda.is_available())'
     ```
     
 5. Download text-generation-webui and GPTQ-for-LLaMa

diff --git a/README.md b/README.md
@@ -73,12 +73,14 @@ This guide actually works well for linux too. Just don't bother with the powersh
     python server.py --cai-chat --load-in-4bit --model llama-13b --no-stream
     ```
 
-10. Download the 30b model from huggingface
+10. Download the hf version 30b model from huggingface
     ```
     python download-model.py decapoda-research/llama-30b-hf
     ```
     
-    You'll need to quantize it yourself using GPTQ-for-LLaMa (this will take a while):
+    You can download the pre-quantized 4 bit versions of the model [here](https://huggingface.co/maderix/llama-65b-4bit/tree/main).
+    
+    Alternatively, you'll need to quantize it yourself using GPTQ-for-LLaMa (this will take a while):
     
     ```
     cd ../repositories/GPTQ-for-LLaMa

diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-### How to get [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) running on Windows or Linux with LLaMa-30b 4bit mode via [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaM) on an RTX 3090 start to finish.  
+### How to get [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) running on Windows or Linux with LLaMa-30b 4bit mode via [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) on an RTX 3090 start to finish.  
 
 This guide actually works well for linux too. Just don't bother with the powershell envs
 

diff --git a/README.md b/README.md
@@ -22,9 +22,17 @@ This guide actually works well for linux too. Just don't bother with the powersh
     conda create -n tgwui
     conda activate tgwui
     conda install python=3.10
-    conda install cuda -c nvidia/label/cuda-11.7.0
-    pip install torch --extra-index-url https://download.pytorch.org/whl/cu117
-    pip install ninja
+    ```
+    
+    
+    ```bash
+    conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
+    ```
+    
+    Old / alternative method
+    ```
+    # conda install cuda -c nvidia/label/cuda-11.7.0
+    # pip install torch --extra-index-url https://download.pytorch.org/whl/cu117
     ```
     
 5. Download text-generation-webui and GPTQ-for-LLaMa
@@ -40,6 +48,7 @@ This guide actually works well for linux too. Just don't bother with the powersh
 6. Build and install gptq package and CUDA kernel (you should be in the GPTQ-for-LLaMa directory)
 
     ```
+    pip install ninja
     python setup_cuda.py install
     ```
     

diff --git a/README.md b/README.md
@@ -1,13 +1,13 @@
-### How to get [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) running on Windows with LLaMa-30b 4bit mode via [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaM) on an RTX 3090 start to finish.  
+### How to get [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) running on Windows or Linux with LLaMa-30b 4bit mode via [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaM) on an RTX 3090 start to finish.  
 
 This guide actually works well for linux too. Just don't bother with the powershell envs
 
-1. Get Miniconda and VS 2019 Build Tools. 
+1. Download prerequisites
     - Download and install [miniconda](https://docs.conda.io/en/latest/miniconda.html)
-    - Download and install [Visual Studio 2019 Build Tools](https://learn.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers) 
+    - (Windows Only) Download and install [Visual Studio 2019 Build Tools](https://learn.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers) 
         - Click on the latest **BuildTools** link, Select **Desktop Environment with C++** when installing)
 
-2. Open the Conda Powershell.
+2. (Windows Only) Open the Conda Powershell.
     - Alternatively, open the regular PowerShell and activate the Conda environment:
         ```powershell
         pwsh -ExecutionPolicy ByPass -NoExit -Command "& ~\miniconda3\shell\condabin\conda-hook.ps1 ; conda activate ~\miniconda3"'

diff --git a/README.md b/README.md
@@ -52,7 +52,7 @@ This guide actually works well for linux too. Just don't bother with the powersh
     
 8. Download the 13b model from huggingface
     ```
-    python download_model.py decapoda-research/llama-13b-hf
+    python download-model.py decapoda-research/llama-13b-hf
     ```
     
     This will take some time. After it's done, rename the folder to `llama-13b`
@@ -66,7 +66,7 @@ This guide actually works well for linux too. Just don't bother with the powersh
 
 10. Download the 30b model from huggingface
     ```
-    python download_model.py decapoda-research/llama-30b-hf
+    python download-model.py decapoda-research/llama-30b-hf
     ```
     
     You'll need to quantize it yourself using GPTQ-for-LLaMa (this will take a while):

diff --git a/README.md b/README.md
@@ -46,8 +46,8 @@ This guide actually works well for linux too. Just don't bother with the powersh
 7. Install the text-generation-webui dependencies
 
     ```
-    > cd ../..
-    > pip install -r .\requirements.txt
+    cd ../..
+    pip install -r requirements.txt
     ```
     
 8. Download the 13b model from huggingface

diff --git a/README.md b/README.md
@@ -52,9 +52,7 @@ This guide actually works well for linux too. Just don't bother with the powersh
     
 8. Download the 13b model from huggingface
     ```
-    cd models
-    git lfs install
-    git clone https://huggingface.co/decapoda-research/llama-13b-hf
+    python download_model.py decapoda-research/llama-13b-hf
     ```
     
     This will take some time. After it's done, rename the folder to `llama-13b`
@@ -68,17 +66,15 @@ This guide actually works well for linux too. Just don't bother with the powersh
 
 10. Download the 30b model from huggingface
     ```
-    cd models
-    git clone https://huggingface.co/decapoda-research/llama-30b-hf
-    mv llama-30b-hf llama-30b
+    python download_model.py decapoda-research/llama-30b-hf
     ```
     
     You'll need to quantize it yourself using GPTQ-for-LLaMa (this will take a while):
     
     ```
     cd ../repositories/GPTQ-for-LLaMa
     pip install datasets
-    HUGGING_FACE_HUB_TOKEN={your huggingface token} CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/llama-30b c4 --wbits 4 --save llama-30b-4bit.pt
+    HUGGING_FACE_HUB_TOKEN={your huggingface token} CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/llama-30b-hf c4 --wbits 4 --save llama-30b-4bit.pt
     ```
     
     Place the `llama30b-4bit.pt` in `models` in `models` directory, alongside the `llama-30b` folder.

diff --git a/README.md b/README.md
@@ -1,5 +1,7 @@
 ### How to get [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) running on Windows with LLaMa-30b 4bit mode via [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaM) on an RTX 3090 start to finish.  
 
+This guide actually works well for linux too. Just don't bother with the powershell envs
+
 1. Get Miniconda and VS 2019 Build Tools. 
     - Download and install [miniconda](https://docs.conda.io/en/latest/miniconda.html)
     - Download and install [Visual Studio 2019 Build Tools](https://learn.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers) 

diff --git a/README.md b/README.md
@@ -76,7 +76,7 @@
     ```
     cd ../repositories/GPTQ-for-LLaMa
     pip install datasets
-    CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/llama-30b c4 --wbits 4 --save llama-30b-4bit.pt
+    HUGGING_FACE_HUB_TOKEN={your huggingface token} CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/llama-30b c4 --wbits 4 --save llama-30b-4bit.pt
     ```
     
     Place the `llama30b-4bit.pt` in `models` in `models` directory, alongside the `llama-30b` folder.

diff --git a/README.md b/README.md
@@ -75,6 +75,7 @@
     
     ```
     cd ../repositories/GPTQ-for-LLaMa
+    pip install datasets
     CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/llama-30b c4 --wbits 4 --save llama-30b-4bit.pt
     ```
     

diff --git a/README.md b/README.md
@@ -75,7 +75,7 @@
     
     ```
     cd ../repositories/GPTQ-for-LLaMa
-    CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/llama-30b c4 --wbits 4 --save llama30b-4bit.pt
+    CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/llama-30b c4 --wbits 4 --save llama-30b-4bit.pt
     ```
     
     Place the `llama30b-4bit.pt` in `models` in `models` directory, alongside the `llama-30b` folder.

diff --git a/README.md b/README.md
@@ -18,7 +18,7 @@
 4. You'll need the CUDA compiler and torch that matches the version in order to build the GPTQ extesions which allows for 4 bit prequantized models. Create a conda env and install python, cuda, and torch that matches the cuda version, as well as ninja for fast compilation
     ```powershell
     conda create -n tgwui
-    conda activate twgui
+    conda activate tgwui
     conda install python=3.10
     conda install cuda -c nvidia/label/cuda-11.7.0
     pip install torch --extra-index-url https://download.pytorch.org/whl/cu117

diff --git a/README.md b/README.md
@@ -10,6 +10,10 @@
         ```powershell
         pwsh -ExecutionPolicy ByPass -NoExit -Command "& ~\miniconda3\shell\condabin\conda-hook.ps1 ; conda activate ~\miniconda3"'
         ```
+    - Sometimes for some reason the GPTQ compilation fails if 'cl' is not in the path. You can try using the `x64 Native Tools Command Prompt for VS 2019` shell instead or, load both conda and VS build tools shell like this:
+        ```powershell
+        cmd /k '"C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Auxiliary\Build\vcvars64.bat" && pwsh -ExecutionPolicy ByPass -NoExit -Command "& ~\miniconda3\shell\condabin\conda-hook.ps1 ; conda activate ~\miniconda3"'
+        ```
     
 4. You'll need the CUDA compiler and torch that matches the version in order to build the GPTQ extesions which allows for 4 bit prequantized models. Create a conda env and install python, cuda, and torch that matches the cuda version, as well as ninja for fast compilation
     ```powershell

diff --git a/README.md b/README.md
@@ -11,7 +11,7 @@
         pwsh -ExecutionPolicy ByPass -NoExit -Command "& ~\miniconda3\shell\condabin\conda-hook.ps1 ; conda activate ~\miniconda3"'
         ```
     
-4. You'll need the CUDA compiler and torch that matches the version in order to build the GPTQ extesions which allows for 4 bit prequantized models. Create a conda env and isntall python, cuda, and torch that matches the cuda version, as well as ninja for fast compilation
+4. You'll need the CUDA compiler and torch that matches the version in order to build the GPTQ extesions which allows for 4 bit prequantized models. Create a conda env and install python, cuda, and torch that matches the cuda version, as well as ninja for fast compilation
     ```powershell
     conda create -n tgwui
     conda activate twgui

diff --git a/README.md b/README.md
@@ -0,0 +1,84 @@
+### How to get [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) running on Windows with LLaMa-30b 4bit mode via [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaM) on an RTX 3090 start to finish.  
+
+1. Get Miniconda and VS 2019 Build Tools. 
+    - Download and install [miniconda](https://docs.conda.io/en/latest/miniconda.html)
+    - Download and install [Visual Studio 2019 Build Tools](https://learn.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers) 
+        - Click on the latest **BuildTools** link, Select **Desktop Environment with C++** when installing)
+
+2. Open the Conda Powershell.
+    - Alternatively, open the regular PowerShell and activate the Conda environment:
+        ```powershell
+        pwsh -ExecutionPolicy ByPass -NoExit -Command "& ~\miniconda3\shell\condabin\conda-hook.ps1 ; conda activate ~\miniconda3"'
+        ```
+    
+4. You'll need the CUDA compiler and torch that matches the version in order to build the GPTQ extesions which allows for 4 bit prequantized models. Create a conda env and isntall python, cuda, and torch that matches the cuda version, as well as ninja for fast compilation
+    ```powershell
+    conda create -n tgwui
+    conda activate twgui
+    conda install python=3.10
+    conda install cuda -c nvidia/label/cuda-11.7.0
+    pip install torch --extra-index-url https://download.pytorch.org/whl/cu117
+    pip install ninja
+    ```
+    
+5. Download text-generation-webui and GPTQ-for-LLaMa
+
+    ```powershell
+    git clone https://github.com/oobabooga/text-generation-webui.git
+    cd text-generation-webui
+    mkdir repositories
+    cd repositories
+    git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
+    cd GPTQ-for-LLaMa
+    ```
+6. Build and install gptq package and CUDA kernel (you should be in the GPTQ-for-LLaMa directory)
+
+    ```
+    python setup_cuda.py install
+    ```
+    
+7. Install the text-generation-webui dependencies
+
+    ```
+    > cd ../..
+    > pip install -r .\requirements.txt
+    ```
+    
+8. Download the 13b model from huggingface
+    ```
+    cd models
+    git lfs install
+    git clone https://huggingface.co/decapoda-research/llama-13b-hf
+    ```
+    
+    This will take some time. After it's done, rename the folder to `llama-13b`
+    
+     The llama-13b prequantized is available [here](https://huggingface.co/decapoda-research/llama-13b-hf-int4/tree/main). Download the `llama-13b-4bit.pt` file and place it in `models` directory, alongside the `llama-13b` folder.
+    
+9. Run the text-generation-webui with llama-13b to test it out
+    ```
+    python server.py --cai-chat --load-in-4bit --model llama-13b --no-stream
+    ```
+
+10. Download the 30b model from huggingface
+    ```
+    cd models
+    git clone https://huggingface.co/decapoda-research/llama-30b-hf
+    mv llama-30b-hf llama-30b
+    ```
+    
+    You'll need to quantize it yourself using GPTQ-for-LLaMa (this will take a while):
+    
+    ```
+    cd ../repositories/GPTQ-for-LLaMa
+    CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/llama-30b c4 --wbits 4 --save llama30b-4bit.pt
+    ```
+    
+    Place the `llama30b-4bit.pt` in `models` in `models` directory, alongside the `llama-30b` folder.
+
+9. Run the text-generation-webui with llama-30b
+    ```
+    python server.py --cai-chat --load-in-4bit --model llama-30b --no-stream
+    ```
+
+