-
-
Save SinclairCoder/bf7237f3ab8f04e641bfdbac398b4f69 to your computer and use it in GitHub Desktop.
Revisions
-
padeoe revised this gist
Nov 8, 2023 . 2 changed files with 6 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -13,7 +13,11 @@ Considering the lack of multi-threaded download support in the official [`huggin - 📦 **Simple**: No dependencies & No installation required. ## Usage First, Download [`hfd.sh`](#file-hfd-sh) or clone this repo, and then grant execution permission to the script. ```bash chmod a+x hfd.sh ``` **Usage Instructions:** ``` $ ./hfd.sh -h Usage: This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -99,7 +99,7 @@ declare -a urls for file in $files; do url="$HF_ENDPOINT/$MODEL_ID/resolve/main/$file" file_dir=$(dirname "$file") mkdir -p "$file_dir" if [[ "$TOOL" == "wget" ]]; then download_cmd="wget -c \"$url\" -O \"$file\"" [[ -n "$HF_TOKEN" ]] && download_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\" -O \"$file\"" -
padeoe revised this gist
Nov 1, 2023 . 1 changed file with 2 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,4 +1,6 @@ # 🤗Huggingface Model Downloader ***Update: The previous version has a bug. When resuming from a breakpoint, there may be an issue causing incomplete files. Please update to the latest version!!!*** Considering the lack of multi-threaded download support in the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli), and the inadequate error handling in [`hf_transfer`](https://github.com/huggingface/hf_transfer), this command-line tool smartly utilizes `wget` or `aria2` for LFS files and `git clone` for the rest. ## Features -
padeoe revised this gist
Nov 1, 2023 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -87,14 +87,16 @@ else fi GIT_LFS_SKIP_SMUDGE=1 git clone "$REPO_URL" && cd "$MODEL_DIR" || { printf "Git clone failed.\n"; exit 1; } for file in $(git lfs ls-files | awk '{print $3}'); do truncate -s 0 "$file" done fi printf "\nStart Downloading lfs files, bash script:\n" files=$(git lfs ls-files | awk '{print $3}') declare -a urls for file in $files; do url="$HF_ENDPOINT/$MODEL_ID/resolve/main/$file" file_dir=$(dirname "$file") mkdir -p "$file_dir" # 创建必要的目录 -
padeoe revised this gist
Nov 1, 2023 . 2 changed files with 33 additions and 15 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -2,8 +2,8 @@ Considering the lack of multi-threaded download support in the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli), and the inadequate error handling in [`hf_transfer`](https://github.com/huggingface/hf_transfer), this command-line tool smartly utilizes `wget` or `aria2` for LFS files and `git clone` for the rest. ## Features - ⏯️ **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. - 🚀 **Multi-threaded Download**: Utilize multiple threads to speed up the download process. - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, save time for models with **duplicate formats** (e.g., .bin and .safetensors). - 🔐 **Auth Support**: For gated models that require Huggingface login, use `--hf_username` and `--hf_token` to authenticate. - 🪞 **Mirror Site Support**: Set up with `HF_ENDPOINT` environment variable. @@ -15,10 +15,10 @@ First, Download [`hfd.sh`](#file-hfd-sh) from this repo. ``` $ ./hfd.sh -h Usage: hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads] [--dataset] Description: Downloads a model or dataset from Hugging Face using the provided model ID. Parameters: model_id The Hugging Face model ID in the format 'repo/model_name'. @@ -28,10 +28,12 @@ Parameters: --hf_token (Optional) Hugging Face token for authentication. --tool (Optional) Download tool to use. Can be wget (default) or aria2c. -x (Optional) Number of download threads for aria2c. --dataset (Optional) Flag to indicate downloading a dataset. Example: hfd bigscience/bloom-560m --exclude safetensors hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 8 hfd lavita/medical-qa-shared-task-v1-toy --dataset ``` **Download a model:** ``` This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,10 +5,10 @@ trap 'printf "\nDownload interrupted. If you re-run the command, you can resume display_help() { cat << EOF Usage: hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads] [--dataset] Description: Downloads a model or dataset from Hugging Face using the provided model ID. Parameters: model_id The Hugging Face model ID in the format 'repo/model_name'. @@ -18,10 +18,12 @@ Parameters: --hf_token (Optional) Hugging Face token for authentication. --tool (Optional) Download tool to use. Can be wget (default) or aria2c. -x (Optional) Number of download threads for aria2c. --dataset (Optional) Flag to indicate downloading a dataset. Example: hfd bigscience/bloom-560m --exclude safetensors hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 8 hfd lavita/medical-qa-shared-task-v1-toy --dataset EOF exit 1 } @@ -41,6 +43,7 @@ while [[ $# -gt 0 ]]; do --hf_token) HF_TOKEN="$2"; shift 2 ;; --tool) TOOL="$2"; shift 2 ;; -x) THREADS="$2"; shift 2 ;; --dataset) DATASET=1; shift ;; *) shift ;; esac done @@ -57,20 +60,28 @@ fi MODEL_DIR="${MODEL_ID#*/}" echo $DATASET if [[ "$DATASET" == 1 ]]; then MODEL_ID="datasets/$MODEL_ID" fi echo $MODEL_DIR if [ -d "$MODEL_DIR/.git" ]; then printf "%s exists, Skip Clone.\n" "$MODEL_DIR" cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "Git pull failed.\n"; exit 1; } else REPO_URL="$HF_ENDPOINT/$MODEL_ID" echo $REPO_URL OUTPUT=$(GIT_TERMINAL_PROMPT=0 git ls-remote "$REPO_URL" 2>&1) GIT_EXIT_CODE=$? if [[ $OUTPUT == *"could not read Username"* ]]; then if [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]]; then printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n" echo $OUTPUT exit 1 fi REPO_URL="https://$HF_USERNAME:$HF_TOKEN@${HF_ENDPOINT#https://}/$MODEL_ID" elif [ $GIT_EXIT_CODE -ne 0 ]; then echo "$OUTPUT"; exit 1 fi @@ -83,24 +94,29 @@ files=$(git lfs ls-files | awk '{print $3}') declare -a urls for file in $files; do truncate -s 0 "$file" url="$HF_ENDPOINT/$MODEL_ID/resolve/main/$file" file_dir=$(dirname "$file") mkdir -p "$file_dir" # 创建必要的目录 if [[ "$TOOL" == "wget" ]]; then download_cmd="wget -c \"$url\" -O \"$file\"" [[ -n "$HF_TOKEN" ]] && download_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\" -O \"$file\"" else download_cmd="aria2c -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\"" [[ -n "$HF_TOKEN" ]] && download_cmd="aria2c --header=\"Authorization: Bearer ${HF_TOKEN}\" -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\"" fi [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]] && printf "# %s\n" "$download_cmd" && continue printf "%s\n" "$download_cmd" urls+=("$url|$file") done for url_file in "${urls[@]}"; do IFS='|' read -r url file <<< "$url_file" file_dir=$(dirname "$file") if [[ "$TOOL" == "wget" ]]; then [[ -n "$HF_TOKEN" ]] && wget --header="Authorization: Bearer ${HF_TOKEN}" -c "$url" -O "$file" || wget -c "$url" -O "$file" else [[ -n "$HF_TOKEN" ]] && aria2c --header="Authorization: Bearer ${HF_TOKEN}" -x $THREADS -s $THREADS -k 1M -c "$url" -d "$file_dir" -o "$(basename "$file")" || aria2c -x $THREADS -s $THREADS -k 1M -c "$url" -d "$file_dir" -o "$(basename "$file")" fi [[ $? -eq 0 ]] && printf "Downloaded %s successfully.\n" "$url" || { printf "Failed to download %s.\n" "$url"; exit 1; } done -
padeoe revised this gist
Oct 27, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -53,7 +53,7 @@ hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token YOUR_HF_TOKE **Download with aria2c and multiple threads:** ```bash ./hfd.sh bigscience/bloom-560m --tool aria2c -x 4 ``` *Output*: -
padeoe revised this gist
Oct 27, 2023 . 2 changed files with 61 additions and 14 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,12 +1,12 @@ # 🤗Huggingface Model Downloader Considering the lack of multi-threaded download support in the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli), and the inadequate error handling in [`hf_transfer`](https://github.com/huggingface/hf_transfer), this command-line tool smartly utilizes `wget` or `aria2` for LFS files and `git clone` for the rest. ## Features - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. - ⏯️ **Multi-threaded Download**: Utilize multiple threads to speed up the download process. - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, save time for models with **duplicate formats** (e.g., .bin and .safetensors). - 🔐 **Auth Support**: For gated models that require Huggingface login, use `--hf_username` and `--hf_token` to authenticate. - 🪞 **Mirror Site Support**: Set up with `HF_ENDPOINT` environment variable. - 🌍 **Proxy Support**: Set up with `HTTPS_PROXY` environment variable. - 📦 **Simple**: No dependencies & No installation required. @@ -15,7 +15,23 @@ First, Download [`hfd.sh`](#file-hfd-sh) from this repo. ``` $ ./hfd.sh -h Usage: hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads] Description: Downloads a model from Hugging Face using the provided model ID. Parameters: model_id The Hugging Face model ID in the format 'repo/model_name'. --exclude (Optional) Flag to specify a string pattern to exclude files from downloading. exclude_pattern The pattern to match against filenames for exclusion. --hf_username (Optional) Hugging Face username for authentication. --hf_token (Optional) Hugging Face token for authentication. --tool (Optional) Download tool to use. Can be wget (default) or aria2c. -x (Optional) Number of download threads for aria2c. Example: hfd bigscience/bloom-560m --exclude safetensors hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 8 ``` **Download a model:** ``` @@ -35,6 +51,11 @@ hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token YOUR_HF_TOKE ./hdf.sh bigscience/bloom-560m --exclude safetensors ``` **Download with aria2c and multiple threads:** ```bash ./hfd.sh bigscience/bloom-560m --download_tool aria2c -x 4 ``` *Output*: During the download, the file URLs will be displayed: This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,7 +5,7 @@ trap 'printf "\nDownload interrupted. If you re-run the command, you can resume display_help() { cat << EOF Usage: hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads] Description: Downloads a model from Hugging Face using the provided model ID. @@ -16,26 +16,43 @@ Parameters: exclude_pattern The pattern to match against filenames for exclusion. --hf_username (Optional) Hugging Face username for authentication. --hf_token (Optional) Hugging Face token for authentication. --tool (Optional) Download tool to use. Can be wget (default) or aria2c. -x (Optional) Number of download threads for aria2c. Example: hfd bigscience/bloom-560m --exclude safetensors hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 8 EOF exit 1 } MODEL_ID=$1 shift # Default values TOOL="wget" THREADS=1 HF_ENDPOINT=${HF_ENDPOINT:-"https://huggingface.co"} while [[ $# -gt 0 ]]; do case $1 in --exclude) EXCLUDE_PATTERN="$2"; shift 2 ;; --hf_username) HF_USERNAME="$2"; shift 2 ;; --hf_token) HF_TOKEN="$2"; shift 2 ;; --tool) TOOL="$2"; shift 2 ;; -x) THREADS="$2"; shift 2 ;; *) shift ;; esac done # Check if aria2c is installed if [[ "$TOOL" == "aria2c" ]]; then if ! command -v aria2c &>/dev/null; then echo "aria2c is not installed. Installing it..." sudo apt update && sudo apt install -y aria2 || { echo "Failed to install aria2c. Exiting."; exit 1; } fi fi [[ -z "$MODEL_ID" || "$MODEL_ID" =~ ^-h ]] && display_help MODEL_DIR="${MODEL_ID#*/}" @@ -44,7 +61,7 @@ if [ -d "$MODEL_DIR/.git" ]; then printf "%s exists, Skip Clone.\n" "$MODEL_DIR" cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "Git pull failed.\n"; exit 1; } else REPO_URL="$HF_ENDPOINT/$MODEL_ID" OUTPUT=$(GIT_TERMINAL_PROMPT=0 git ls-remote "$REPO_URL" 2>&1) GIT_EXIT_CODE=$? @@ -53,7 +70,7 @@ else printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n" exit 1 fi REPO_URL="https://$HF_USERNAME:$HF_TOKEN@$HF_ENDPOINT/$MODEL_ID" elif [ $GIT_EXIT_CODE -ne 0 ]; then echo "$OUTPUT"; exit 1 fi @@ -66,16 +83,25 @@ files=$(git lfs ls-files | awk '{print $3}') declare -a urls for file in $files; do url="$HF_ENDPOINT/$MODEL_ID/resolve/main/$file" if [[ "$TOOL" == "wget" ]]; then download_cmd="wget -c \"$url\"" [[ -n "$HF_TOKEN" ]] && download_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\"" else download_cmd="aria2c -x $THREADS -s $THREADS -k 1M -c \"$url\"" [[ -n "$HF_TOKEN" ]] && download_cmd="aria2c --header=\"Authorization: Bearer ${HF_TOKEN}\" -x $THREADS -s $THREADS -k 1M -c \"$url\"" fi [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]] && printf "# %s\n" "$download_cmd" && continue printf "%s\n" "$download_cmd" urls+=("$url") done for url in "${urls[@]}"; do if [[ "$TOOL" == "wget" ]]; then [[ -n "$HF_TOKEN" ]] && wget --header="Authorization: Bearer ${HF_TOKEN}" -c "$url" || wget -c "$url" else [[ -n "$HF_TOKEN" ]] && aria2c --header="Authorization: Bearer ${HF_TOKEN}" -x $THREADS -s $THREADS -k 1M -c "$url" || aria2c -x $THREADS -s $THREADS -k 1M -c "$url" fi [[ $? -eq 0 ]] && printf "Downloaded %s successfully.\n" "$url" || { printf "Failed to download %s.\n" "$url"; exit 1; } done -
padeoe revised this gist
Oct 25, 2023 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,5 +1,7 @@ # 🤗Huggingface Model Downloader ***Update***: We recommend the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli) tool! ~This command-line tool avoids the complexity and frequent disruptions often faced with [`snapshot_download`](https://huggingface.co/docs/huggingface_hub/v0.17.3/en/package_reference/file_download#huggingface_hub.snapshot_download) and `git clone` when fetching large models, like LLM. It smartly utilizes wget(which supports resuming) for Git LFS files and git clone for the rest.~ ## Features - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. -
padeoe revised this gist
Sep 28, 2023 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,5 +1,5 @@ # 🤗Huggingface Model Downloader This command-line tool avoids the complexity and frequent disruptions often faced with [`snapshot_download`](https://huggingface.co/docs/huggingface_hub/v0.17.3/en/package_reference/file_download#huggingface_hub.snapshot_download) and `git clone` when fetching large models, like LLM. It smartly utilizes wget(which supports resuming) for Git LFS files and git clone for the rest. ## Features - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. -
padeoe revised this gist
Sep 27, 2023 . 1 changed file with 6 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -46,11 +46,15 @@ if [ -d "$MODEL_DIR/.git" ]; then else REPO_URL="https://huggingface.co/$MODEL_ID" OUTPUT=$(GIT_TERMINAL_PROMPT=0 git ls-remote "$REPO_URL" 2>&1) GIT_EXIT_CODE=$? if [[ $OUTPUT == *"could not read Username"* ]]; then if [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]]; then printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n" exit 1 fi REPO_URL="https://$HF_USERNAME:$HF_TOKEN@huggingface.co/$MODEL_ID" elif [ $GIT_EXIT_CODE -ne 0 ]; then echo "$OUTPUT"; exit 1 fi -
padeoe revised this gist
Sep 27, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,5 +1,5 @@ # 🤗Huggingface Model Downloader 🚀 This command-line tool avoids the complexity and frequent disruptions often faced with `hf_hub_download` and `git clone` when fetching large models, like LLM. It smartly utilizes wget(which supports resuming) for Git LFS files and git clone for the rest. ## Features - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. -
padeoe revised this gist
Sep 27, 2023 . No changes.There are no files selected for viewing
-
padeoe renamed this gist
Sep 27, 2023 . 1 changed file with 3 additions and 7 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,16 +1,12 @@ # 🤗Huggingface Model Downloader 🚀 This tool avoids the frequent disruptions often faced with `hf_hub_download` and `git clone` when fetching large models, like LLM. It smartly utilizes `wget`(which supports resuming) for Git LFS files and `git clone` for the rest. ## Features - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, save time for models with **duplicate formats** (e.g., .bin and .safetensors). - 🔐 **Auth Support**: For gated models that require Huggingface login, use `--hf_username` and `--hf_token` to authenticate. - 🌍 **Proxy Support**: Set up with `HTTPS_PROXY` environment variable. - 📦 **Simple**: No dependencies & No installation required. ## Usage First, Download [`hfd.sh`](#file-hfd-sh) from this repo. @@ -55,4 +51,4 @@ wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/onnx/decoder_m For easier access, you can create an alias for the script: ```bash alias hfd="$PWD/hfd.sh" ``` -
padeoe revised this gist
Sep 27, 2023 . 1 changed file with 10 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -7,30 +7,37 @@ Bypass the common **network interruptions** faced with `hf_hub_download` and `gi ## Features - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, save time for models with **duplicate formats** (e.g., .bin and .safetensors). - 🔐 **Auth Support**: For gated models that require Huggingface login, use `--hf_username` and `--hf_token` to authenticate. - 🌍 **Proxy Support**: Set up with `HTTPS_PROXY` environment variable. - 📦 **Simple**: No dependency & simple codes. ## Usage First, Download [`hfd.sh`](#file-hfd-sh) from this repo. ``` $ ./hfd.sh -h Usage: hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] ``` **Download a model:** ``` ./hdf.sh bigscience/bloom-560m ``` **Download a model need login** Get huggingface token from [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens), then ```bash hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token YOUR_HF_TOKEN ``` **Download a model and exclude certain files (e.g., .safetensors):** ```bash ./hdf.sh bigscience/bloom-560m --exclude safetensors ``` *Output*: During the download, the file URLs will be displayed: ```console -
padeoe revised this gist
Sep 27, 2023 . 1 changed file with 47 additions and 65 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,96 +1,78 @@ #!/bin/bash trap 'printf "\nDownload interrupted. If you re-run the command, you can resume the download from the breakpoint.\n"; exit 1' INT display_help() { cat << EOF Usage: hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] Description: Downloads a model from Hugging Face using the provided model ID. Parameters: model_id The Hugging Face model ID in the format 'repo/model_name'. --exclude (Optional) Flag to specify a string pattern to exclude files from downloading. exclude_pattern The pattern to match against filenames for exclusion. --hf_username (Optional) Hugging Face username for authentication. --hf_token (Optional) Hugging Face token for authentication. Example: hfd bigscience/bloom-560m --exclude safetensors hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken EOF exit 1 } MODEL_ID=$1 shift while [[ $# -gt 0 ]]; do case $1 in --exclude) EXCLUDE_PATTERN="$2"; shift 2 ;; --hf_username) HF_USERNAME="$2"; shift 2 ;; --hf_token) HF_TOKEN="$2"; shift 2 ;; *) shift ;; esac done [[ -z "$MODEL_ID" || "$MODEL_ID" =~ ^-h ]] && display_help MODEL_DIR="${MODEL_ID#*/}" if [ -d "$MODEL_DIR/.git" ]; then printf "%s exists, Skip Clone.\n" "$MODEL_DIR" cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "Git pull failed.\n"; exit 1; } else REPO_URL="https://huggingface.co/$MODEL_ID" OUTPUT=$(GIT_TERMINAL_PROMPT=0 git ls-remote "$REPO_URL" 2>&1) if [[ $OUTPUT == *"could not read Username"* ]]; then [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]] && printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n" && exit 1 REPO_URL="https://$HF_USERNAME:$HF_TOKEN@huggingface.co/$MODEL_ID" elif [ $? -ne 0 ]; then echo "$OUTPUT"; exit 1 fi GIT_LFS_SKIP_SMUDGE=1 git clone "$REPO_URL" && cd "$MODEL_DIR" || { printf "Git clone failed.\n"; exit 1; } fi printf "\nStart Downloading lfs files, bash script:\n" files=$(git lfs ls-files | awk '{print $3}') declare -a urls for file in $files; do url="https://huggingface.co/$MODEL_ID/resolve/main/$file" wget_cmd="wget -c \"$url\"" [[ -n "$HF_TOKEN" ]] && wget_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\"" [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]] && printf "# %s\n" "$wget_cmd" && continue printf "%s\n" "$wget_cmd" urls+=("$url") done for url in "${urls[@]}"; do [[ -n "$HF_TOKEN" ]] && wget --header="Authorization: Bearer ${HF_TOKEN}" -c "$url" || wget -c "$url" [[ $? -eq 0 ]] && printf "Downloaded %s successfully.\n" "$url" || { printf "Failed to download %s.\n" "$url"; exit 1; } done printf "Download completed successfully.\n" -
padeoe revised this gist
Sep 27, 2023 . No changes.There are no files selected for viewing
-
padeoe revised this gist
Sep 26, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -44,7 +44,7 @@ wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/onnx/decoder_m ... ``` ### Create an Alias for Convenience For easier access, you can create an alias for the script: ```bash alias hfd="$PWD/hfd.sh" -
padeoe revised this gist
Sep 26, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -12,7 +12,7 @@ Bypass the common **network interruptions** faced with `hf_hub_download` and `gi - 📦 **Simple**: No dependency & simple codes. ## Usage First, Download [`hfd.sh`](#file-hfd-sh) from this repo. ``` $ ./hfd.sh -h Usage: -
padeoe created this gist
Sep 26, 2023 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,51 @@ # 🤗Huggingface Model Downloader 🚀 📦 Download large Huggingface models effortlessly with the power and simplicity of **`wget`**! Bypass the common **network interruptions** faced with `hf_hub_download` and `git clone` for **large** models. This simple script leverages wget for Git LFS files and git clone for others. ## Features - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. - 🌍 **Proxy Support**: Set up with `HTTPS_PROXY` environment variable. - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, handy for repos with **duplicate model formats** (e.g., .bin and .safetensors). - 📦 **Simple**: No dependency & simple codes. ## Usage First, Download [`hfd.sh`](#hfd-sh) from this repo. ``` $ ./hfd.sh -h Usage: hfd <model_id> [--exclude exclude_pattern] ``` **Download a model:** ``` ./hdf.sh bigscience/bloom-560m ``` **Download a model and exclude certain files (e.g., .safetensors):** ```bash ./hdf.sh bigscience/bloom-560m --exclude safetensors ``` **Output:** During the download, the file URLs will be displayed: ```console $ ./hdf.sh bigscience/bloom-560m --exclude safetensors ... Start Downloading lfs files, bash script: wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/flax_model.msgpack # wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/model.safetensors wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/onnx/decoder_model.onnx ... ``` ### 3. Create an Alias for Convenience For easier access, you can create an alias for the script: ```bash alias hfd="$PWD/hfd.sh" ``` This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,96 @@ #!/bin/bash # Trap the INT signal to handle Ctrl+C trap ctrl_c INT # Function to handle Ctrl+C ctrl_c() { printf "\nDownload interrupted. If you re-run the command, you can resume the download from the breakpoint.\n" exit 1 } # Display help information display_help() { printf "Usage:\n" printf " hfd <model_id> [--exclude exclude_pattern]\n\n" printf "Description:\n" printf " Downloads a model from Hugging Face using the provided model ID.\n\n" printf "Parameters:\n" printf " model_id The Hugging Face model ID in the format 'repo/model_name'.\n" printf " --exclude (Optional) Flag to specify a string pattern to exclude files from downloading.\n" printf " exclude_pattern The pattern to match against filenames for exclusion.\n\n" printf "Example:\n" printf " hfd bigscience/bloom-560m --exclude safetensors\n" exit 1 } MODEL_ID=$1 EXCLUDE_PATTERN='' # Parse arguments for --exclude option shift # Move to the next argument while [[ $# -gt 0 ]]; do key="$1" case $key in --exclude) EXCLUDE_PATTERN="$2" shift # past argument shift # past value ;; *) # unknown option shift # past argument ;; esac done # Check if no model_id is provided or -h/--help is provided if [[ -z "$MODEL_ID" ]] || [[ "$MODEL_ID" == "-h" ]] || [[ "$MODEL_ID" == "--help" ]]; then display_help fi MODEL_DIR=$(echo "$MODEL_ID" | awk -F'/' '{print $2}') # Check if the model directory exists and contains a .git directory if [ -d "$MODEL_DIR" ] && [ -d "$MODEL_DIR/.git" ]; then printf "%s exists, Skip Clone.\n" "$MODEL_DIR" cd "$MODEL_DIR" if GIT_LFS_SKIP_SMUDGE=1 git pull; then printf "Git pull successful.\n" else printf "Git pull failed.\n" exit 1 fi else printf "Start clone without lfs.\n" if GIT_LFS_SKIP_SMUDGE=1 git clone "https://huggingface.co/$MODEL_ID"; then cd "$MODEL_DIR" else printf "Git clone failed.\n" exit 1 fi fi printf "\nStart Downloading lfs files, bash script:\n" files=$(git lfs ls-files | awk '{print $3}') declare -a urls for file in $files; do url="https://huggingface.co/$MODEL_ID/resolve/main/$file" if [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]]; then printf "# wget -c $url\n" continue fi printf "wget -c $url\n" urls+=("$url") done for url in $urls; do if wget -c $url; then printf "Downloaded %s successfully.\n" "$url" else printf "Failed to download %s.\n" "$url" exit 1 fi done printf "Download completed successfully.\n"