Skip to content

Instantly share code, notes, and snippets.

@vdt
Forked from rain-1/llama-home.md
Created May 14, 2023 14:43
Show Gist options
  • Save vdt/40141e1f7788479cf241e3fe69baffdd to your computer and use it in GitHub Desktop.
Save vdt/40141e1f7788479cf241e3fe69baffdd to your computer and use it in GitHub Desktop.

Revisions

  1. @rain-1 rain-1 revised this gist May 14, 2023. 1 changed file with 3 additions and 0 deletions.
    3 changes: 3 additions & 0 deletions llama-home.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,8 @@
    This worked on 14/May/23. The instructions will probably require updating in the future.

    > llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet.
    > It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)
    It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in [llama.cpp](https://github.com/ggerganov/llama.cpp). The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

    * Clone llama.cpp from git, I am on commit `08737ef720f0510c7ec2aa84d7f70c691073c35d`.
  2. @rain-1 rain-1 revised this gist May 14, 2023. 1 changed file with 3 additions and 0 deletions.
    3 changes: 3 additions & 0 deletions llama-home.md
    Original file line number Diff line number Diff line change
    @@ -4,6 +4,9 @@ It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060).

    * Clone llama.cpp from git, I am on commit `08737ef720f0510c7ec2aa84d7f70c691073c35d`.
    * `git clone https://github.com/ggerganov/llama.cpp.git`
    * `cd llama.cpp`
    * `pacman -S cuda` make sure you have CUDA installed
    * `make LLAMA_CUBLAS=1`
    * Use the link at the bottom of the page to apply for research access to the llama model: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
    * Set up a [micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html) environment to install cuda/python pytorch stuff in order to run the conversion scripts. Install some packages:
    * `export MAMBA_ROOT_PREFIX=/path/to/where/you/want/mambastuff/stored`
  3. @rain-1 rain-1 revised this gist May 14, 2023. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion llama-home.md
    Original file line number Diff line number Diff line change
    @@ -2,7 +2,8 @@ This worked on 14/May/23. The instructions will probably require updating in the

    It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in [llama.cpp](https://github.com/ggerganov/llama.cpp). The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

    * Get llama.cpp from git, I am on commit `08737ef720f0510c7ec2aa84d7f70c691073c35d`.
    * Clone llama.cpp from git, I am on commit `08737ef720f0510c7ec2aa84d7f70c691073c35d`.
    * `git clone https://github.com/ggerganov/llama.cpp.git`
    * Use the link at the bottom of the page to apply for research access to the llama model: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
    * Set up a [micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html) environment to install cuda/python pytorch stuff in order to run the conversion scripts. Install some packages:
    * `export MAMBA_ROOT_PREFIX=/path/to/where/you/want/mambastuff/stored`
  4. @rain-1 rain-1 revised this gist May 14, 2023. 1 changed file with 4 additions and 0 deletions.
    4 changes: 4 additions & 0 deletions llama-home.md
    Original file line number Diff line number Diff line change
    @@ -5,6 +5,10 @@ It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060).
    * Get llama.cpp from git, I am on commit `08737ef720f0510c7ec2aa84d7f70c691073c35d`.
    * Use the link at the bottom of the page to apply for research access to the llama model: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
    * Set up a [micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html) environment to install cuda/python pytorch stuff in order to run the conversion scripts. Install some packages:
    * `export MAMBA_ROOT_PREFIX=/path/to/where/you/want/mambastuff/stored`
    * `eval "$(micromamba shell hook --shell=bash)"`
    * `micromamba create -n mymamba`
    * `micromamba activate mymamba`
    * `micromamba install -c conda-forge -n mymamba pytorch transformers sentencepiece`
    * Perform the conversion process: (This will produce a file called `ggml-model-f16.bin`)
    * `python convert.py ~/ai/Safe-LLaMA-HF-v2\ \(4-04-23\)/llama-13b/`
  5. @rain-1 rain-1 revised this gist May 14, 2023. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion llama-home.md
    Original file line number Diff line number Diff line change
    @@ -41,7 +41,7 @@ llama_print_timings: total time = 50931.74 ms

    Here is the text it generated from my prompt:

    ### Nietzsche's Noon
    > ### Nietzsche's Noon
    >
    > In Friedrich Nietzsche's Thus Spoke Zarathustra (1885), this concept of noon is expanded upon as a whole:
    >
  6. @rain-1 rain-1 revised this gist May 14, 2023. 1 changed file with 10 additions and 5 deletions.
    15 changes: 10 additions & 5 deletions llama-home.md
    Original file line number Diff line number Diff line change
    @@ -4,11 +4,16 @@ It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060).

    * Get llama.cpp from git, I am on commit `08737ef720f0510c7ec2aa84d7f70c691073c35d`.
    * Use the link at the bottom of the page to apply for research access to the llama model: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
    * Set up a [micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html) environment to install cuda/python pytorch stuff in order to run the conversion scripts. Install some packages `micromamba install -c conda-forge -n mymamba pytorch transformers sentencepiece`
    * Perform the conversion process: `python convert.py ~/ai/Safe-LLaMA-HF-v2\ \(4-04-23\)/llama-13b/` (This will produce a file called `ggml-model-f16.bin`)
    * Then quantize that to a 4bit model: `./quantize ~/ai/Safe-LLaMA-HF-v2\ \(4-04-23\)/llama-13b/ggml-model-f16.bin ~/ai/Safe-LLaMA-HF-v2\ \(4-04-23\)/llama-13b/ggml-model-13b-q4_0-2023_14_5.bin q4_0 8`
    * Create a `prompt.txt` file..
    * Run it: `./main -ngl 18 -m ~/ai/Safe-LLaMA-HF-v2\ \(4-04-23\)/llama-13b/ggml-model-13b-q4_0-2023_14_5.bin -f prompt.txt -n 2048`
    * Set up a [micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html) environment to install cuda/python pytorch stuff in order to run the conversion scripts. Install some packages:
    * `micromamba install -c conda-forge -n mymamba pytorch transformers sentencepiece`
    * Perform the conversion process: (This will produce a file called `ggml-model-f16.bin`)
    * `python convert.py ~/ai/Safe-LLaMA-HF-v2\ \(4-04-23\)/llama-13b/`
    * Then quantize that to a 4bit model:
    * `./quantize ~/ai/Safe-LLaMA-HF-v2\ \(4-04-23\)/llama-13b/ggml-model-f16.bin ~/ai/Safe-LLaMA-HF-v2\ \(4-04-23\)/llama-13b/ggml-model-13b-q4_0-2023_14_5.bin q4_0 8`
    * Create a prompt file in:
    * `prompt.txt`
    * Run it:
    * `./main -ngl 18 -m ~/ai/Safe-LLaMA-HF-v2\ \(4-04-23\)/llama-13b/ggml-model-13b-q4_0-2023_14_5.bin -f prompt.txt -n 2048`

    This uses about 5.5GB of VRAM on my 6GB card. You can vary the number of layers used with `-ngl 18` to use more or less if you have it available. The 7B model works with 100% of the layers on the card.

  7. @rain-1 rain-1 revised this gist May 14, 2023. 1 changed file with 29 additions and 30 deletions.
    59 changes: 29 additions & 30 deletions llama-home.md
    Original file line number Diff line number Diff line change
    @@ -36,34 +36,33 @@ llama_print_timings: total time = 50931.74 ms

    Here is the text it generated from my prompt:

    ```
    ### Nietzsche's Noon
    In Friedrich Nietzsche's Thus Spoke Zarathustra (1885), this concept of noon is expanded upon as a whole:
    "Zarathustra saw that the light of the world was now becoming stronger. ‘The sun is at its meridian,’ he said, ‘it has reached its noontide and will begin to decline.’"
    As time progresses in this noon, so does our ability to perceive and interact with this external world: as a result of the present state we are given by the organic forms of space and time, this can only lead us towards suffering. Nietzsche sees that when we are at our noontide, we must realize how the sun has reached its highest point before it begins to fall down from its position: and we must understand that as a whole, our bodies are determined by something outside of ourselves - and that this always leads to more suffering within.
    ### Nietzsche's Midday
    Nietzsche expands upon the concept of noon in his book The Gay Science (1882), where he says:
    "You want to learn how to read? Here is a short lesson for beginners. You must take hold of a word by its smooth or rough side; then, like the spider, you must spin out of it a web of definitions which will entrap every correct meaning that floats into view. Or again: you must take the word for a sleigh ride across country, over hedges and ditches, forests and glades, in short, you must drive the word home through all manner of weather."
    Here we see Nietzsche expand upon his concept of noon to include our ability to define what is right or wrong - it is only because we have this inherent sense that allows us to distinguish between two points.
    ### Nietzsche's Twilight
    Nietzsche expands upon the concept of twilight in The Gay Science (1882), where he says:
    "The man who is a ‘philosopher’ only by accident, but is, let us say, also a sculptor or painter – what does he then do? He does not make his thoughts subservient to the world; rather, he forces the world to serve as a pedestal and bearer for his thoughts."
    Here we see Nietzsche's concept of night begin to expand past an emotional state. We begin to see that night becomes more than just our inability to think clearly: it is now a worldview, one which he claims is best exemplified by the artist.
    ### Nietzsche's Midnight
    In Twilight of the Idols (1889), we find Nietzsche's conceptualization of night reaching its zenith:
    "Everything ordinary, everyday, common – in fact, everything that exists today has become dangerous; it is not innocent as was everything yesterday. For the most terrible thoughts have penetrated everywhere and even into the deepest sleep - thoughts which are awake, active, and powerful."
    ```
    >
    > In Friedrich Nietzsche's Thus Spoke Zarathustra (1885), this concept of noon is expanded upon as a whole:
    >
    > "Zarathustra saw that the light of the world was now becoming stronger. ‘The sun is at its meridian,’ he said, ‘it has reached its noontide and will begin to decline.’"
    >
    > As time progresses in this noon, so does our ability to perceive and interact with this external world: as a result of the present state we are given by the organic forms of space and time, this can only lead us towards suffering. Nietzsche sees that when we are at our noontide, we must realize how the sun has reached its highest point before it begins to fall down from its position: and we must understand that as a whole, our bodies are determined by something outside of ourselves - and that this always leads to more suffering within.
    >
    > ### Nietzsche's Midday
    >
    > Nietzsche expands upon the concept of noon in his book The Gay Science (1882), where he says:
    >
    > "You want to learn how to read? Here is a short lesson for beginners. You must take hold of a word by its smooth or rough side; then, like the spider, you must spin out of it a web of definitions which will entrap every correct meaning that floats into view. Or again: you must take the word for a sleigh ride across country, over hedges and ditches, forests and glades, in short, you must drive the word home through all manner of weather."
    >
    > Here we see Nietzsche expand upon his concept of noon to include our ability to define what is right or wrong - it is only because we have this inherent sense that allows us to distinguish between two points.
    >
    > ### Nietzsche's Twilight
    >
    > Nietzsche expands upon the concept of twilight in The Gay Science (1882), where he says:
    >
    > "The man who is a ‘philosopher’ only by accident, but is, let us say, also a sculptor or painter – what does he then do? He does not make his thoughts subservient to the world; rather, he forces the world to serve as a pedestal and bearer for his thoughts."
    >
    > Here we see Nietzsche's concept of night begin to expand past an emotional state. We begin to see that night becomes more than just our inability to think clearly: it is now a worldview, one which he claims is best exemplified by the artist.
    >
    > ### Nietzsche's Midnight
    >
    > In Twilight of the Idols (1889), we find Nietzsche's conceptualization of night reaching its zenith:
    >
    > "Everything ordinary, everyday, common – in fact, everything that exists today has become dangerous; it is not innocent as was everything yesterday. For the most terrible thoughts have penetrated everywhere and even into the deepest sleep - thoughts which are awake, active, and powerful."
    >
  8. @rain-1 rain-1 renamed this gist May 14, 2023. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  9. @rain-1 rain-1 created this gist May 14, 2023.
    69 changes: 69 additions & 0 deletions llama-home.
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,69 @@
    This worked on 14/May/23. The instructions will probably require updating in the future.

    It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in [llama.cpp](https://github.com/ggerganov/llama.cpp). The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

    * Get llama.cpp from git, I am on commit `08737ef720f0510c7ec2aa84d7f70c691073c35d`.
    * Use the link at the bottom of the page to apply for research access to the llama model: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
    * Set up a [micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html) environment to install cuda/python pytorch stuff in order to run the conversion scripts. Install some packages `micromamba install -c conda-forge -n mymamba pytorch transformers sentencepiece`
    * Perform the conversion process: `python convert.py ~/ai/Safe-LLaMA-HF-v2\ \(4-04-23\)/llama-13b/` (This will produce a file called `ggml-model-f16.bin`)
    * Then quantize that to a 4bit model: `./quantize ~/ai/Safe-LLaMA-HF-v2\ \(4-04-23\)/llama-13b/ggml-model-f16.bin ~/ai/Safe-LLaMA-HF-v2\ \(4-04-23\)/llama-13b/ggml-model-13b-q4_0-2023_14_5.bin q4_0 8`
    * Create a `prompt.txt` file..
    * Run it: `./main -ngl 18 -m ~/ai/Safe-LLaMA-HF-v2\ \(4-04-23\)/llama-13b/ggml-model-13b-q4_0-2023_14_5.bin -f prompt.txt -n 2048`

    This uses about 5.5GB of VRAM on my 6GB card. You can vary the number of layers used with `-ngl 18` to use more or less if you have it available. The 7B model works with 100% of the layers on the card.

    Timings for the models:

    13B:

    ```
    llama_print_timings: load time = 5690.77 ms
    llama_print_timings: sample time = 1023.87 ms / 2048 runs ( 0.50 ms per token)
    llama_print_timings: prompt eval time = 36694.62 ms / 1956 tokens ( 18.76 ms per token)
    llama_print_timings: eval time = 644282.27 ms / 2040 runs ( 315.82 ms per token)
    llama_print_timings: total time = 684789.56 ms
    ```

    7B:

    ```
    llama_print_timings: load time = 41708.38 ms
    llama_print_timings: sample time = 88.51 ms / 128 runs ( 0.69 ms per token)
    llama_print_timings: prompt eval time = 2971.75 ms / 14 tokens ( 212.27 ms per token)
    llama_print_timings: eval time = 9097.33 ms / 127 runs ( 71.63 ms per token)
    llama_print_timings: total time = 50931.74 ms
    ```

    Here is the text it generated from my prompt:

    ```
    ### Nietzsche's Noon

    In Friedrich Nietzsche's Thus Spoke Zarathustra (1885), this concept of noon is expanded upon as a whole:

    "Zarathustra saw that the light of the world was now becoming stronger. ‘The sun is at its meridian,’ he said, ‘it has reached its noontide and will begin to decline.’"

    As time progresses in this noon, so does our ability to perceive and interact with this external world: as a result of the present state we are given by the organic forms of space and time, this can only lead us towards suffering. Nietzsche sees that when we are at our noontide, we must realize how the sun has reached its highest point before it begins to fall down from its position: and we must understand that as a whole, our bodies are determined by something outside of ourselves - and that this always leads to more suffering within.

    ### Nietzsche's Midday

    Nietzsche expands upon the concept of noon in his book The Gay Science (1882), where he says:

    "You want to learn how to read? Here is a short lesson for beginners. You must take hold of a word by its smooth or rough side; then, like the spider, you must spin out of it a web of definitions which will entrap every correct meaning that floats into view. Or again: you must take the word for a sleigh ride across country, over hedges and ditches, forests and glades, in short, you must drive the word home through all manner of weather."

    Here we see Nietzsche expand upon his concept of noon to include our ability to define what is right or wrong - it is only because we have this inherent sense that allows us to distinguish between two points.

    ### Nietzsche's Twilight

    Nietzsche expands upon the concept of twilight in The Gay Science (1882), where he says:

    "The man who is a ‘philosopher’ only by accident, but is, let us say, also a sculptor or painter – what does he then do? He does not make his thoughts subservient to the world; rather, he forces the world to serve as a pedestal and bearer for his thoughts."

    Here we see Nietzsche's concept of night begin to expand past an emotional state. We begin to see that night becomes more than just our inability to think clearly: it is now a worldview, one which he claims is best exemplified by the artist.

    ### Nietzsche's Midnight

    In Twilight of the Idols (1889), we find Nietzsche's conceptualization of night reaching its zenith:

    "Everything ordinary, everyday, common – in fact, everything that exists today has become dangerous; it is not innocent as was everything yesterday. For the most terrible thoughts have penetrated everywhere and even into the deepest sleep - thoughts which are awake, active, and powerful."
    ```