Skip to content

Instantly share code, notes, and snippets.

@lmmx
Last active October 18, 2025 02:52
Show Gist options
  • Save lmmx/49c00238d3f05c6ec3a4afd659b9d01a to your computer and use it in GitHub Desktop.
Save lmmx/49c00238d3f05c6ec3a4afd659b9d01a to your computer and use it in GitHub Desktop.
Installing faster-whisper with GPU support via CTranslate2 (dependencies: CUDA>=11.2 CuDNN 8.x and CuBLAS)
conda create -n fasterwhisper python
conda activate fasterwhisper
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
conda install "cudnn>8" -c conda-forge

Then install CTranslate2 into ~/opt:

I needed libiomp5 and couldn't get it to find the Intel MKL include directory (by setting MKLROOT)

sudo apt install libomp5 libomp-dev

The official docs say to make a directory called "build" but the setup.py for CTranslate2 looks for a directory called "lib" or "lib64"... so I changed the name of the directory to "lib".

cd opt
git clone --recursive https://github.com/OpenNMT/CTranslate2.git
cd CTranslate2/
mkdir lib && cd lib
cmake .. -DWITH_CUDA=ON -DWITH_CUDNN=ON -DWITH_MKL=OFF
make -j4
sudo make install

This can now be used to build the Python package

cd python
pip install -r install_requirements.txt
export CTRANSLATE2_ROOT=$HOME/opt/CTranslate2
python setup.py bdist_wheel
pip install dist/*.whl

Then add the CTranslate2 library path to the linker path in your bashrc:

echo 'export LD_LIBRARY_PATH=$HOME/opt/CTranslate2/lib:$LD_LIBRARY_PATH' >> ~/.bashrc

Then install faster-whisper

cd ~/dev # or wherever
git clone https://github.com/guillaumekln/faster-whisper
cd faster-whisper
pip install transformers
pip install -e .[conversion]

Then convert a model (float16 or int8 on GPU respectively) with:

ct2-transformers-converter --model openai/whisper-large-v2 --output_dir whisper-large-v2-ct2 \
  --copy_files tokenizer.json --quantization float16
ct2-transformers-converter --model openai/whisper-large-v2 --output_dir whisper-large-v2-ct2-int8 \
  --copy_files tokenizer.json --quantization int8_float16
#!/bin/bash
# set up recording parameters
sample_rate=44100 # audio sample rate
tempfile=$PWD/arec_chunk.tmp.wav
destfile=$PWD/arec_chunk.wav
duration=100
# start recording loop
while true; do
# record audio in the background and save to a temporary file
arecord -d $duration -r $sample_rate -c 1 -t wav $tempfile 2> /dev/null &
# capture the process ID of the arecord run
arec_pid=$!
# echo "Recorded PID $arec_pid"
# wait for a specified duration or for user input to kill the process
while true; do
read -d '' -t $duration -n 1 # wait for 10 seconds or a single spacebar press
echo -en "\r"
if [[ $REPLY == ' ' ]]; then
# echo "Stop recording"
break # Stop recording
else
# echo "Extend recording"
: # Extend recording time
fi
done
# echo "(Program exitted read loop)"
if [[ -z $REPLY ]]; then
# echo "Recording timed out after $duration seconds"
: # Timeout reached
fi
# echo "Now killing PID $arec_pid"
kill $arec_pid # send signal to kill arecord process
mv $tempfile $destfile # Move the full WAV to read it while a new arecord process begins
# run the transcription script in the background, reading from the temporary file
pushd ~/dev/faster-whisper/ > /dev/null
python transcribe_arec_chunk.py &
popd > /dev/null
done
from pathlib import Path
from faster_whisper import WhisperModel
# Run on GPU with FP16
# model_path = "whisper-large-v2-ct2/"
# model = WhisperModel(model_path, device="cuda", compute_type="float16")
# or run on GPU with INT8
model_path = "whisper-large-v2-ct2-int8/"
model = WhisperModel(model_path, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_path, device="cpu", compute_type="int8")
audio_file = Path.home() / "dev" / "whisper.cpp" / "samples" / "jfk.wav"
segments, info = model.transcribe(str(audio_file), beam_size=5)
print(
"Detected language '%s' with probability %f"
% (info.language, info.language_probability)
)
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
from pathlib import Path
from faster_whisper import WhisperModel
# run on GPU with INT8
model_path = "whisper-large-v2-ct2-int8/"
model = WhisperModel(model_path, device="cuda", compute_type="int8_float16")
audio_file = (
Path.home() / "dev" / "testing" / "audio" / "fasterwhisper" / "arec_chunk.wav"
)
segments, info = model.transcribe(str(audio_file), language="en", beam_size=5)
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment