Considering the lack of multi-threaded download support in the official huggingface-cli, and the inadequate error handling in hf_transfer, this command-line tool smartly utilizes wget or aria2 for LFS files and git clone for the rest.
- ⏯️ Resume from breakpoint: You can re-run it or Ctrl+C anytime.
- 🚀 Multi-threaded Download: Utilize multiple threads to speed up the download process.
- 🚫 File Exclusion: Use
--excludeor--includeto skip or specify files, save time for models with duplicate formats (e.g.,*.binor*.safetensors). - 🔐 Auth Support: For gated models that require Huggingface login, use
--hf_usernameand--hf_tokento authenticate. - 🪞 Mirror Site Support: Set up with
HF_ENDPOINTenvironment variable. - 🌍 Proxy Support: Set up with
HTTPS_PROXYenvironment variable. - 📦 Simple: Only depend on
git,aria2c/wget.
First, Download hfd.sh or clone this repo, and then grant execution permission to the script.
chmod a+x hfd.shUsage Instructions:
$ ./hfd.sh -h
Usage:
hfd <repo_id> [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool aria2c|wget] [-x threads] [--dataset]
Description:
Downloads a model or dataset from Hugging Face using the provided repo ID.
Parameters:
repo_id The Hugging Face repo ID in the format 'org/repo_name'.
--include (Optional) Flag to specify a string pattern to include files for downloading.
--exclude (Optional) Flag to specify a string pattern to exclude files from downloading.
include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor'.
--hf_username (Optional) Hugging Face username for authentication.
--hf_token (Optional) Hugging Face token for authentication.
--tool (Optional) Download tool to use. Can be wget (default) or aria2c.
-x (Optional) Number of download threads for aria2c.
--dataset (Optional) Flag to indicate downloading a dataset.
Example:
hfd bigscience/bloom-560m --exclude safetensors
hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 8
hfd lavita/medical-qa-shared-task-v1-toy --dataset
Download a model:
./hfd.sh bigscience/bloom-560m
Download a model need login
Get huggingface token from https://huggingface.co/settings/tokens, then
hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token YOUR_HF_TOKENDownload a model and exclude certain files (e.g., .safetensors):
./hfd.sh bigscience/bloom-560m --exclude safetensorsDownload with aria2c and multiple threads:
./hfd.sh bigscience/bloom-560m --tool aria2c -x 4Output: During the download, the file URLs will be displayed:
$ ./hfd.sh bigscience/bloom-560m --exclude safetensors
...
Start Downloading lfs files, bash script:
wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/flax_model.msgpack
# wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/model.safetensors
wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/onnx/decoder_model.onnx
...For easier access, you can create an alias for the script:
alias hfd="$PWD/hfd.sh"