Skip to content

Instantly share code, notes, and snippets.

@SinclairCoder
Forked from padeoe/README_hfd.md
Created November 11, 2023 12:02
Show Gist options
  • Select an option

  • Save SinclairCoder/bf7237f3ab8f04e641bfdbac398b4f69 to your computer and use it in GitHub Desktop.

Select an option

Save SinclairCoder/bf7237f3ab8f04e641bfdbac398b4f69 to your computer and use it in GitHub Desktop.
Command-line Tool for Easy Downloading of Huggingface Models

🤗Huggingface Model Downloader

This command-line tool avoids the complexity and frequent disruptions often faced with snapshot_download and git clone when fetching large models, like LLM. It smartly utilizes wget(which supports resuming) for Git LFS files and git clone for the rest.

Features

  • 🚀 Resume from breakpoint: You can re-run it or Ctrl+C anytime.
  • 🚫 File Exclusion: Use --exclude to skip specific files, save time for models with duplicate formats (e.g., .bin and .safetensors).
  • 🔐 Auth Support: For gated models that require Huggingface login, use --hf_username and --hf_token to authenticate.
  • 🌍 Proxy Support: Set up with HTTPS_PROXY environment variable.
  • 📦 Simple: No dependencies & No installation required.

Usage

First, Download hfd.sh from this repo.

$ ./hfd.sh -h
Usage:
  hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token]

Download a model:

./hdf.sh bigscience/bloom-560m

Download a model need login

Get huggingface token from https://huggingface.co/settings/tokens, then

hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token YOUR_HF_TOKEN

Download a model and exclude certain files (e.g., .safetensors):

./hdf.sh bigscience/bloom-560m --exclude safetensors

Output: During the download, the file URLs will be displayed:

$ ./hdf.sh bigscience/bloom-560m --exclude safetensors
...
Start Downloading lfs files, bash script:

wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/flax_model.msgpack
# wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/model.safetensors
wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/onnx/decoder_model.onnx
...

Create an Alias for Convenience

For easier access, you can create an alias for the script:

alias hfd="$PWD/hfd.sh"
#!/bin/bash
trap 'printf "\nDownload interrupted. If you re-run the command, you can resume the download from the breakpoint.\n"; exit 1' INT
display_help() {
cat << EOF
Usage:
hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token]
Description:
Downloads a model from Hugging Face using the provided model ID.
Parameters:
model_id The Hugging Face model ID in the format 'repo/model_name'.
--exclude (Optional) Flag to specify a string pattern to exclude files from downloading.
exclude_pattern The pattern to match against filenames for exclusion.
--hf_username (Optional) Hugging Face username for authentication.
--hf_token (Optional) Hugging Face token for authentication.
Example:
hfd bigscience/bloom-560m --exclude safetensors
hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken
EOF
exit 1
}
MODEL_ID=$1
shift
while [[ $# -gt 0 ]]; do
case $1 in
--exclude) EXCLUDE_PATTERN="$2"; shift 2 ;;
--hf_username) HF_USERNAME="$2"; shift 2 ;;
--hf_token) HF_TOKEN="$2"; shift 2 ;;
*) shift ;;
esac
done
[[ -z "$MODEL_ID" || "$MODEL_ID" =~ ^-h ]] && display_help
MODEL_DIR="${MODEL_ID#*/}"
if [ -d "$MODEL_DIR/.git" ]; then
printf "%s exists, Skip Clone.\n" "$MODEL_DIR"
cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "Git pull failed.\n"; exit 1; }
else
REPO_URL="https://huggingface.co/$MODEL_ID"
OUTPUT=$(GIT_TERMINAL_PROMPT=0 git ls-remote "$REPO_URL" 2>&1)
GIT_EXIT_CODE=$?
if [[ $OUTPUT == *"could not read Username"* ]]; then
if [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]]; then
printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n"
exit 1
fi
REPO_URL="https://$HF_USERNAME:$HF_TOKEN@huggingface.co/$MODEL_ID"
elif [ $GIT_EXIT_CODE -ne 0 ]; then
echo "$OUTPUT"; exit 1
fi
GIT_LFS_SKIP_SMUDGE=1 git clone "$REPO_URL" && cd "$MODEL_DIR" || { printf "Git clone failed.\n"; exit 1; }
fi
printf "\nStart Downloading lfs files, bash script:\n"
files=$(git lfs ls-files | awk '{print $3}')
declare -a urls
for file in $files; do
url="https://huggingface.co/$MODEL_ID/resolve/main/$file"
wget_cmd="wget -c \"$url\""
[[ -n "$HF_TOKEN" ]] && wget_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\""
[[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]] && printf "# %s\n" "$wget_cmd" && continue
printf "%s\n" "$wget_cmd"
urls+=("$url")
done
for url in "${urls[@]}"; do
[[ -n "$HF_TOKEN" ]] && wget --header="Authorization: Bearer ${HF_TOKEN}" -c "$url" || wget -c "$url"
[[ $? -eq 0 ]] && printf "Downloaded %s successfully.\n" "$url" || { printf "Failed to download %s.\n" "$url"; exit 1; }
done
printf "Download completed successfully.\n"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment