Skip to content

Instantly share code, notes, and snippets.

@lxe
Last active March 12, 2025 13:55
Show Gist options
  • Save lxe/82eb87db25fdb75b92fa18a6d494ee3c to your computer and use it in GitHub Desktop.
Save lxe/82eb87db25fdb75b92fa18a6d494ee3c to your computer and use it in GitHub Desktop.

Revisions

  1. lxe revised this gist Mar 14, 2023. 1 changed file with 7 additions and 7 deletions.
    14 changes: 7 additions & 7 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -24,15 +24,15 @@ This guide actually works well for linux too. Just don't bother with the powersh
    conda install python=3.10
    ```
    Installing pytorch and cuda is the hardest part of machine learning
    I've come up with this install line from the following sources:
    - https://pytorch.org/get-started/locally/#start-locally
    - https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#installing-previous-cuda-releases
    ```bash
    conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
    ```
    Old / alternative method
    ```
    # conda install cuda -c nvidia/label/cuda-11.7.0
    # pip install torch --extra-index-url https://download.pytorch.org/whl/cu117
    conda install cuda pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia/label/cuda-11.7.0
    python -c 'import torch; print(torch.cuda.is_available())'
    ```
    5. Download text-generation-webui and GPTQ-for-LLaMa
  2. lxe revised this gist Mar 13, 2023. 1 changed file with 4 additions and 2 deletions.
    6 changes: 4 additions & 2 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -73,12 +73,14 @@ This guide actually works well for linux too. Just don't bother with the powersh
    python server.py --cai-chat --load-in-4bit --model llama-13b --no-stream
    ```
    10. Download the 30b model from huggingface
    10. Download the hf version 30b model from huggingface
    ```
    python download-model.py decapoda-research/llama-30b-hf
    ```
    You'll need to quantize it yourself using GPTQ-for-LLaMa (this will take a while):
    You can download the pre-quantized 4 bit versions of the model [here](https://huggingface.co/maderix/llama-65b-4bit/tree/main).
    Alternatively, you'll need to quantize it yourself using GPTQ-for-LLaMa (this will take a while):
    ```
    cd ../repositories/GPTQ-for-LLaMa
  3. lxe revised this gist Mar 13, 2023. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    ### How to get [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) running on Windows or Linux with LLaMa-30b 4bit mode via [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaM) on an RTX 3090 start to finish.
    ### How to get [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) running on Windows or Linux with LLaMa-30b 4bit mode via [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) on an RTX 3090 start to finish.

    This guide actually works well for linux too. Just don't bother with the powershell envs

  4. lxe revised this gist Mar 13, 2023. 1 changed file with 12 additions and 3 deletions.
    15 changes: 12 additions & 3 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -22,9 +22,17 @@ This guide actually works well for linux too. Just don't bother with the powersh
    conda create -n tgwui
    conda activate tgwui
    conda install python=3.10
    conda install cuda -c nvidia/label/cuda-11.7.0
    pip install torch --extra-index-url https://download.pytorch.org/whl/cu117
    pip install ninja
    ```
    ```bash
    conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
    ```
    Old / alternative method
    ```
    # conda install cuda -c nvidia/label/cuda-11.7.0
    # pip install torch --extra-index-url https://download.pytorch.org/whl/cu117
    ```
    5. Download text-generation-webui and GPTQ-for-LLaMa
    @@ -40,6 +48,7 @@ This guide actually works well for linux too. Just don't bother with the powersh
    6. Build and install gptq package and CUDA kernel (you should be in the GPTQ-for-LLaMa directory)
    ```
    pip install ninja
    python setup_cuda.py install
    ```
  5. lxe revised this gist Mar 13, 2023. 1 changed file with 4 additions and 4 deletions.
    8 changes: 4 additions & 4 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -1,13 +1,13 @@
    ### How to get [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) running on Windows with LLaMa-30b 4bit mode via [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaM) on an RTX 3090 start to finish.
    ### How to get [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) running on Windows or Linux with LLaMa-30b 4bit mode via [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaM) on an RTX 3090 start to finish.

    This guide actually works well for linux too. Just don't bother with the powershell envs

    1. Get Miniconda and VS 2019 Build Tools.
    1. Download prerequisites
    - Download and install [miniconda](https://docs.conda.io/en/latest/miniconda.html)
    - Download and install [Visual Studio 2019 Build Tools](https://learn.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers)
    - (Windows Only) Download and install [Visual Studio 2019 Build Tools](https://learn.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers)
    - Click on the latest **BuildTools** link, Select **Desktop Environment with C++** when installing)

    2. Open the Conda Powershell.
    2. (Windows Only) Open the Conda Powershell.
    - Alternatively, open the regular PowerShell and activate the Conda environment:
    ```powershell
    pwsh -ExecutionPolicy ByPass -NoExit -Command "& ~\miniconda3\shell\condabin\conda-hook.ps1 ; conda activate ~\miniconda3"'
  6. lxe revised this gist Mar 13, 2023. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -52,7 +52,7 @@ This guide actually works well for linux too. Just don't bother with the powersh
    8. Download the 13b model from huggingface
    ```
    python download_model.py decapoda-research/llama-13b-hf
    python download-model.py decapoda-research/llama-13b-hf
    ```
    This will take some time. After it's done, rename the folder to `llama-13b`
    @@ -66,7 +66,7 @@ This guide actually works well for linux too. Just don't bother with the powersh
    10. Download the 30b model from huggingface
    ```
    python download_model.py decapoda-research/llama-30b-hf
    python download-model.py decapoda-research/llama-30b-hf
    ```
    You'll need to quantize it yourself using GPTQ-for-LLaMa (this will take a while):
  7. lxe revised this gist Mar 13, 2023. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -46,8 +46,8 @@ This guide actually works well for linux too. Just don't bother with the powersh
    7. Install the text-generation-webui dependencies
    ```
    > cd ../..
    > pip install -r .\requirements.txt
    cd ../..
    pip install -r requirements.txt
    ```
    8. Download the 13b model from huggingface
  8. lxe revised this gist Mar 13, 2023. 1 changed file with 3 additions and 7 deletions.
    10 changes: 3 additions & 7 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -52,9 +52,7 @@ This guide actually works well for linux too. Just don't bother with the powersh
    8. Download the 13b model from huggingface
    ```
    cd models
    git lfs install
    git clone https://huggingface.co/decapoda-research/llama-13b-hf
    python download_model.py decapoda-research/llama-13b-hf
    ```
    This will take some time. After it's done, rename the folder to `llama-13b`
    @@ -68,17 +66,15 @@ This guide actually works well for linux too. Just don't bother with the powersh
    10. Download the 30b model from huggingface
    ```
    cd models
    git clone https://huggingface.co/decapoda-research/llama-30b-hf
    mv llama-30b-hf llama-30b
    python download_model.py decapoda-research/llama-30b-hf
    ```
    You'll need to quantize it yourself using GPTQ-for-LLaMa (this will take a while):
    ```
    cd ../repositories/GPTQ-for-LLaMa
    pip install datasets
    HUGGING_FACE_HUB_TOKEN={your huggingface token} CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/llama-30b c4 --wbits 4 --save llama-30b-4bit.pt
    HUGGING_FACE_HUB_TOKEN={your huggingface token} CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/llama-30b-hf c4 --wbits 4 --save llama-30b-4bit.pt
    ```
    Place the `llama30b-4bit.pt` in `models` in `models` directory, alongside the `llama-30b` folder.
  9. lxe revised this gist Mar 13, 2023. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,7 @@
    ### How to get [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) running on Windows with LLaMa-30b 4bit mode via [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaM) on an RTX 3090 start to finish.

    This guide actually works well for linux too. Just don't bother with the powershell envs

    1. Get Miniconda and VS 2019 Build Tools.
    - Download and install [miniconda](https://docs.conda.io/en/latest/miniconda.html)
    - Download and install [Visual Studio 2019 Build Tools](https://learn.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers)
  10. lxe revised this gist Mar 13, 2023. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -76,7 +76,7 @@
    ```
    cd ../repositories/GPTQ-for-LLaMa
    pip install datasets
    CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/llama-30b c4 --wbits 4 --save llama-30b-4bit.pt
    HUGGING_FACE_HUB_TOKEN={your huggingface token} CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/llama-30b c4 --wbits 4 --save llama-30b-4bit.pt
    ```
    Place the `llama30b-4bit.pt` in `models` in `models` directory, alongside the `llama-30b` folder.
  11. lxe revised this gist Mar 12, 2023. 1 changed file with 1 addition and 0 deletions.
    1 change: 1 addition & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -75,6 +75,7 @@
    ```
    cd ../repositories/GPTQ-for-LLaMa
    pip install datasets
    CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/llama-30b c4 --wbits 4 --save llama-30b-4bit.pt
    ```
  12. lxe revised this gist Mar 12, 2023. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -75,7 +75,7 @@
    ```
    cd ../repositories/GPTQ-for-LLaMa
    CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/llama-30b c4 --wbits 4 --save llama30b-4bit.pt
    CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/llama-30b c4 --wbits 4 --save llama-30b-4bit.pt
    ```
    Place the `llama30b-4bit.pt` in `models` in `models` directory, alongside the `llama-30b` folder.
  13. lxe revised this gist Mar 12, 2023. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -18,7 +18,7 @@
    4. You'll need the CUDA compiler and torch that matches the version in order to build the GPTQ extesions which allows for 4 bit prequantized models. Create a conda env and install python, cuda, and torch that matches the cuda version, as well as ninja for fast compilation
    ```powershell
    conda create -n tgwui
    conda activate twgui
    conda activate tgwui
    conda install python=3.10
    conda install cuda -c nvidia/label/cuda-11.7.0
    pip install torch --extra-index-url https://download.pytorch.org/whl/cu117
  14. lxe revised this gist Mar 12, 2023. 1 changed file with 4 additions and 0 deletions.
    4 changes: 4 additions & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -10,6 +10,10 @@
    ```powershell
    pwsh -ExecutionPolicy ByPass -NoExit -Command "& ~\miniconda3\shell\condabin\conda-hook.ps1 ; conda activate ~\miniconda3"'
    ```
    - Sometimes for some reason the GPTQ compilation fails if 'cl' is not in the path. You can try using the `x64 Native Tools Command Prompt for VS 2019` shell instead or, load both conda and VS build tools shell like this:
    ```powershell
    cmd /k '"C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Auxiliary\Build\vcvars64.bat" && pwsh -ExecutionPolicy ByPass -NoExit -Command "& ~\miniconda3\shell\condabin\conda-hook.ps1 ; conda activate ~\miniconda3"'
    ```
    4. You'll need the CUDA compiler and torch that matches the version in order to build the GPTQ extesions which allows for 4 bit prequantized models. Create a conda env and install python, cuda, and torch that matches the cuda version, as well as ninja for fast compilation
    ```powershell
  15. lxe revised this gist Mar 12, 2023. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -11,7 +11,7 @@
    pwsh -ExecutionPolicy ByPass -NoExit -Command "& ~\miniconda3\shell\condabin\conda-hook.ps1 ; conda activate ~\miniconda3"'
    ```
    4. You'll need the CUDA compiler and torch that matches the version in order to build the GPTQ extesions which allows for 4 bit prequantized models. Create a conda env and isntall python, cuda, and torch that matches the cuda version, as well as ninja for fast compilation
    4. You'll need the CUDA compiler and torch that matches the version in order to build the GPTQ extesions which allows for 4 bit prequantized models. Create a conda env and install python, cuda, and torch that matches the cuda version, as well as ninja for fast compilation
    ```powershell
    conda create -n tgwui
    conda activate twgui
  16. lxe created this gist Mar 11, 2023.
    84 changes: 84 additions & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,84 @@
    ### How to get [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) running on Windows with LLaMa-30b 4bit mode via [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaM) on an RTX 3090 start to finish.

    1. Get Miniconda and VS 2019 Build Tools.
    - Download and install [miniconda](https://docs.conda.io/en/latest/miniconda.html)
    - Download and install [Visual Studio 2019 Build Tools](https://learn.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers)
    - Click on the latest **BuildTools** link, Select **Desktop Environment with C++** when installing)

    2. Open the Conda Powershell.
    - Alternatively, open the regular PowerShell and activate the Conda environment:
    ```powershell
    pwsh -ExecutionPolicy ByPass -NoExit -Command "& ~\miniconda3\shell\condabin\conda-hook.ps1 ; conda activate ~\miniconda3"'
    ```
    4. You'll need the CUDA compiler and torch that matches the version in order to build the GPTQ extesions which allows for 4 bit prequantized models. Create a conda env and isntall python, cuda, and torch that matches the cuda version, as well as ninja for fast compilation
    ```powershell
    conda create -n tgwui
    conda activate twgui
    conda install python=3.10
    conda install cuda -c nvidia/label/cuda-11.7.0
    pip install torch --extra-index-url https://download.pytorch.org/whl/cu117
    pip install ninja
    ```
    5. Download text-generation-webui and GPTQ-for-LLaMa
    ```powershell
    git clone https://github.com/oobabooga/text-generation-webui.git
    cd text-generation-webui
    mkdir repositories
    cd repositories
    git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git
    cd GPTQ-for-LLaMa
    ```
    6. Build and install gptq package and CUDA kernel (you should be in the GPTQ-for-LLaMa directory)
    ```
    python setup_cuda.py install
    ```
    7. Install the text-generation-webui dependencies
    ```
    > cd ../..
    > pip install -r .\requirements.txt
    ```
    8. Download the 13b model from huggingface
    ```
    cd models
    git lfs install
    git clone https://huggingface.co/decapoda-research/llama-13b-hf
    ```
    This will take some time. After it's done, rename the folder to `llama-13b`
    The llama-13b prequantized is available [here](https://huggingface.co/decapoda-research/llama-13b-hf-int4/tree/main). Download the `llama-13b-4bit.pt` file and place it in `models` directory, alongside the `llama-13b` folder.
    9. Run the text-generation-webui with llama-13b to test it out
    ```
    python server.py --cai-chat --load-in-4bit --model llama-13b --no-stream
    ```
    10. Download the 30b model from huggingface
    ```
    cd models
    git clone https://huggingface.co/decapoda-research/llama-30b-hf
    mv llama-30b-hf llama-30b
    ```
    You'll need to quantize it yourself using GPTQ-for-LLaMa (this will take a while):
    ```
    cd ../repositories/GPTQ-for-LLaMa
    CUDA_VISIBLE_DEVICES=0 python llama.py ../../models/llama-30b c4 --wbits 4 --save llama30b-4bit.pt
    ```
    Place the `llama30b-4bit.pt` in `models` in `models` directory, alongside the `llama-30b` folder.
    9. Run the text-generation-webui with llama-30b
    ```
    python server.py --cai-chat --load-in-4bit --model llama-30b --no-stream
    ```