Skip to content

Instantly share code, notes, and snippets.

@TheaperDeng
Last active June 26, 2023 09:03
Show Gist options
  • Save TheaperDeng/347a010d1ddaed73515331b811192fa5 to your computer and use it in GitHub Desktop.
Save TheaperDeng/347a010d1ddaed73515331b811192fa5 to your computer and use it in GitHub Desktop.

ggml-gpu-hands-on

This gist works as a hands-on note for running llama.cpp on various GPU. It may out-of-date due to the proj update.

This is only a personal record so readers may not have an out-of-box hands-on experience

Complete configs

Record the verified configs. The project is still developing very fast, so the granularity for the record is specified to the commit id.

Imple. Device OS llama.cpp version 3rd party version Step
Cuda 3060Ti (together with i5-12600KF) Ubuntu 22.04 llama.cpp CUDA 12.1 CUDA-ubuntu-2204
OpenCL UHD 630 (on a i7-9750H) Windows 10 llama.cpp OpenCL-SDK
CLBlast
OpenCL-windows-10

Steps

Cuda-Ubuntu 22.04

Step 1: Install CUDA 12.1. I am pretty sure some other version around 12.1 could work but I put the installation spec link here. image The only extra thing we may need to do is to add the bin (including nvcc) to PATH.

Step 2: Clone the llama.cpp (be sure to checkout the commit id I used in previous table)

Step 3: make LLAMA_CUBLAS=1. The whole process is quite smooth (I guess CUDA community is widely adopted so that a smooth installation and out-of-box performance could be achieved.

Step 4: prepare a ckpt (just download one from huggingface, here is what I use)

Step 5: ./main -m ~/Downloads/llama-7b.ggmlv3.q4_0.bin -p "Once upon a time" -n 128 -t 6 -ngl 35. Here a critical parameter is -ngl which states how many layers are "offloaded" to GPU. So that some GPU w/o enough VMem could also work. image

OpenCL-Windows 10

This is really not a mature way to run GGML, it may suitable for developers rather than users

Step 1: Basic windows C/C++ developing environment. This includes cmake (3.27.0-rc3 is the version I use) and Visual Studio 2022. The module need to be installed is image

Step 2: Add some extra works include add some path items. (This may not really required)

  • C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.30.30705
  • C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.30.30705\bin\Hostx64\x64
  • C:\Program Files\CMake\bin

Step 3: Intel vcpkg: https://vcpkg.io/en/getting-started.html, and add it to path

  • C:\llama.cpp-dev\vcpkg

Step 4: OpenCL-SDK: Build from source and install https://github.com/KhronosGroup/OpenCL-SDK/tree/ae7fcae82fe0b7bcc272e43fc324181b2d544eea#example-build

Step 5: CLBlast: Build from source and install https://github.com/CNugteren/CLBlast/tree/28a61c53a69ad598cd3ed8992fb6be88643f3c4b

git clone https://github.com/CNugteren/CLBlast.git
mkdir CLBlast/build
cd CLBlast/build
cmake .. -DBUILD_SHARED_LIBS=OFF -DTUNERS=OFF
cmake --build . --config Release
cmake --install .

Step 6: Build llama.cpp https://github.com/ggerganov/llama.cpp/tree/bbca06e26949686d61a5126332680ba3cccf235c

# CMakeLists.txt
# add this to line 4
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /MT")

and run

mkdir build
cd build
cmake .. -DLLAMA_CLBLAST=ON
cmake --build . --config Release

Step 7: prepare a ckpt (just download one from huggingface, here is what I use)

Step 8: GGML_OPENCL_DEVICE=1 ./main.exe -m C:\\llama.cpp-dev\\llama-7b.ggmlv3.q4_0.bin -n 128 -p "Once upon a time" -t 6 -ngl 35 Here a critical parameter is -ngl which states how many layers are "offloaded" to GPU. So that some GPU w/o enough VMem could also work. On my laptop, 2 devices support OpenCL and the intel igpu is the second one, so we need to set GGML_OPENCL_DEVICE=1

800ms/token :(, maybe CLBlast is not optimized for intel igpu, OneMKL is a promising choice. openvinotoolkit/openvino#10464 (comment) could be another reason.

image

@plusbang
Copy link

Based on my experience when run ggml on A730M, add more details:

  • If set environment variables http_proxy and https_proxy to clone github repo, then must unset these variables when use .\vcpkg\bootstrap-vcpkg.bat to build vcpkg (In Step3).
  • Before Step5, you may have to add locations of opencl_include_dirs and opencl_libraries to path. Otherwise, errors may occur during building CLBlist.
  • If want to set GGML_OPENCL_DEVICE=1(In Step8), it's better to use linux bash shell (such as use mobaxterm).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment