-
-
Save AlphaNext/9cb0655832c64cae2e45f13ce0cb93a9 to your computer and use it in GitHub Desktop.
Revisions
-
padeoe revised this gist
Mar 22, 2024 . 1 changed file with 10 additions and 10 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -16,6 +16,12 @@ First, Download [`hfd.sh`](#file-hfd-sh) or clone this repo, and then grant exec ```bash chmod a+x hfd.sh ``` you can create an alias for convenience ```bash alias hfd="$PWD/hfd.sh" ``` **Usage Instructions:** ``` $ ./hfd.sh -h @@ -44,7 +50,7 @@ Example: ``` **Download a model:** ``` hfd bigscience/bloom-560m ``` **Download a model need login** @@ -57,19 +63,19 @@ hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME_NOT_EMAIL --hf_token YO ```bash hfd bigscience/bloom-560m --exclude *.safetensors ``` **Download with aria2c and multiple threads:** ```bash hfd bigscience/bloom-560m ``` *Output*: During the download, the file URLs will be displayed: ```console $ hfd bigscience/bloom-560m --tool wget --exclude *.safetensors ... Start Downloading lfs files, bash script: @@ -78,9 +84,3 @@ wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/flax_model.msg wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/onnx/decoder_model.onnx ... ``` -
padeoe revised this gist
Mar 22, 2024 . 2 changed files with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -35,7 +35,7 @@ Parameters: --tool (Optional) Download tool to use. Can be aria2c (default) or wget. -x (Optional) Number of download threads for aria2c. Defaults to 4. --dataset (Optional) Flag to indicate downloading a dataset. --local-dir (Optional) Local directory path where the model or dataset will be stored. Example: hfd bigscience/bloom-560m --exclude *.safetensors This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -25,7 +25,7 @@ Parameters: --tool (Optional) Download tool to use. Can be aria2c (default) or wget. -x (Optional) Number of download threads for aria2c. Defaults to 4. --dataset (Optional) Flag to indicate downloading a dataset. --local-dir (Optional) Local directory path where the model or dataset will be stored. Example: hfd bigscience/bloom-560m --exclude *.safetensors -
padeoe revised this gist
Mar 22, 2024 . 2 changed files with 19 additions and 14 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -20,7 +20,7 @@ chmod a+x hfd.sh ``` $ ./hfd.sh -h Usage: hfd <repo_id> [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool aria2c|wget] [-x threads] [--dataset] [--local-dir path] Description: Downloads a model or dataset from Hugging Face using the provided repo ID. @@ -29,16 +29,17 @@ Parameters: repo_id The Hugging Face repo ID in the format 'org/repo_name'. --include (Optional) Flag to specify a string pattern to include files for downloading. --exclude (Optional) Flag to specify a string pattern to exclude files from downloading. include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor', '--include vae/*'. --hf_username (Optional) Hugging Face username for authentication. **NOT EMAIL**. --hf_token (Optional) Hugging Face token for authentication. --tool (Optional) Download tool to use. Can be aria2c (default) or wget. -x (Optional) Number of download threads for aria2c. Defaults to 4. --dataset (Optional) Flag to indicate downloading a dataset. --local-dir (Optional) Local directory path where the model or dataset will be stored. Example: hfd bigscience/bloom-560m --exclude *.safetensors hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken -x 4 hfd lavita/medical-qa-shared-task-v1-toy --dataset ``` **Download a model:** This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -10,7 +10,7 @@ trap 'printf "${YELLOW}\nDownload interrupted. If you re-run the command, you ca display_help() { cat << EOF Usage: hfd <repo_id> [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool aria2c|wget] [-x threads] [--dataset] [--local-dir path] Description: Downloads a model or dataset from Hugging Face using the provided repo ID. @@ -19,12 +19,13 @@ Parameters: repo_id The Hugging Face repo ID in the format 'org/repo_name'. --include (Optional) Flag to specify a string pattern to include files for downloading. --exclude (Optional) Flag to specify a string pattern to exclude files from downloading. include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor', '--include vae/*'. --hf_username (Optional) Hugging Face username for authentication. **NOT EMAIL**. --hf_token (Optional) Hugging Face token for authentication. --tool (Optional) Download tool to use. Can be aria2c (default) or wget. -x (Optional) Number of download threads for aria2c. Defaults to 4. --dataset (Optional) Flag to indicate downloading a dataset. --local-dir (Optional) Local directory path where the model or dataset will be stored. Example: hfd bigscience/bloom-560m --exclude *.safetensors @@ -51,6 +52,7 @@ while [[ $# -gt 0 ]]; do --tool) TOOL="$2"; shift 2 ;; -x) THREADS="$2"; shift 2 ;; --dataset) DATASET=1; shift ;; --local-dir) LOCAL_DIR="$2"; shift 2 ;; *) shift ;; esac done @@ -69,16 +71,18 @@ check_command curl; check_command git; check_command git-lfs [[ -z "$MODEL_ID" || "$MODEL_ID" =~ ^-h ]] && display_help if [[ -z "$LOCAL_DIR" ]]; then LOCAL_DIR="${MODEL_ID#*/}" fi if [[ "$DATASET" == 1 ]]; then MODEL_ID="datasets/$MODEL_ID" fi echo "Downloading to $LOCAL_DIR" if [ -d "$LOCAL_DIR/.git" ]; then printf "${YELLOW}%s exists, Skip Clone.\n${NC}" "$LOCAL_DIR" cd "$LOCAL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "${RED}Git pull failed.${NC}\n"; exit 1; } else REPO_URL="$HF_ENDPOINT/$MODEL_ID" GIT_REFS_URL="${REPO_URL}/info/refs?service=git-upload-pack" @@ -95,15 +99,15 @@ else printf "${YELLOW}Executing debug command: curl -v %s\nOutput:${NC}\n" "$GIT_REFS_URL" curl -v "$GIT_REFS_URL"; printf "\n${RED}Git clone failed.\n${NC}"; exit 1 fi echo "git clone $REPO_URL $LOCAL_DIR" GIT_LFS_SKIP_SMUDGE=1 git clone $REPO_URL $LOCAL_DIR && cd "$LOCAL_DIR" || { printf "${RED}Git clone failed.\n${NC}"; exit 1; } for file in $(git lfs ls-files | awk '{print $3}'); do truncate -s 0 "$file" done fi printf "\nStart Downloading lfs files, bash script:\ncd $LOCAL_DIR\n" files=$(git lfs ls-files | awk '{print $3}') declare -a urls -
padeoe revised this gist
Mar 22, 2024 . 2 changed files with 17 additions and 17 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -29,16 +29,16 @@ Parameters: repo_id The Hugging Face repo ID in the format 'org/repo_name'. --include (Optional) Flag to specify a string pattern to include files for downloading. --exclude (Optional) Flag to specify a string pattern to exclude files from downloading. include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor', '--include vae/*' --hf_username (Optional) Hugging Face username for authentication. **NOT EMAIL**. --hf_token (Optional) Hugging Face token for authentication. --tool (Optional) Download tool to use. Can be aria2c (default) or wget. -x (Optional) Number of download threads for aria2c. Defaults to 4. --dataset (Optional) Flag to indicate downloading a dataset. Example: hfd bigscience/bloom-560m --exclude *.safetensors hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 4 hfd lavita/medical-qa-shared-task-v1-toy --dataset ``` **Download a model:** @@ -50,25 +50,25 @@ Example: Get huggingface token from [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens), then ```bash hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME_NOT_EMAIL --hf_token YOUR_HF_TOKEN ``` **Download a model and exclude certain files (e.g., .safetensors):** ```bash ./hfd.sh bigscience/bloom-560m --exclude *.safetensors ``` **Download with aria2c and multiple threads:** ```bash ./hfd.sh bigscience/bloom-560m ``` *Output*: During the download, the file URLs will be displayed: ```console $ ./hfd.sh bigscience/bloom-560m --tool wget --exclude *.safetensors ... Start Downloading lfs files, bash script: This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -20,15 +20,15 @@ Parameters: --include (Optional) Flag to specify a string pattern to include files for downloading. --exclude (Optional) Flag to specify a string pattern to exclude files from downloading. include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor'. --hf_username (Optional) Hugging Face username for authentication. **NOT EMAIL**. --hf_token (Optional) Hugging Face token for authentication. --tool (Optional) Download tool to use. Can be wget (default) or aria2c. -x (Optional) Number of download threads for aria2c. Defaults to 4. --dataset (Optional) Flag to indicate downloading a dataset. Example: hfd bigscience/bloom-560m --exclude *.safetensors hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken -x 4 hfd lavita/medical-qa-shared-task-v1-toy --dataset EOF exit 1 @@ -39,7 +39,7 @@ shift # Default values TOOL="aria2c" THREADS=4 HF_ENDPOINT=${HF_ENDPOINT:-"https://huggingface.co"} while [[ $# -gt 0 ]]; do @@ -118,8 +118,8 @@ for file in $files; do download_cmd="aria2c --console-log-level=error -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\"" [[ -n "$HF_TOKEN" ]] && download_cmd="aria2c --header=\"Authorization: Bearer ${HF_TOKEN}\" --console-log-level=error -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\"" fi [[ -n "$INCLUDE_PATTERN" && ! "$file" == $INCLUDE_PATTERN ]] && printf "# %s\n" "$download_cmd" && continue [[ -n "$EXCLUDE_PATTERN" && "$file" == $EXCLUDE_PATTERN ]] && printf "# %s\n" "$download_cmd" && continue printf "%s\n" "$download_cmd" urls+=("$url|$file") done -
padeoe revised this gist
Mar 20, 2024 . 2 changed files with 21 additions and 17 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,11 +5,11 @@ Considering the lack of multi-threaded download support in the official [`huggin ## Features - ⏯️ **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. - 🚀 **Multi-threaded Download**: Utilize multiple threads to speed up the download process. - 🚫 **File Exclusion**: Use `--exclude` or `--include` to skip or specify files, save time for models with **duplicate formats** (e.g., `*.bin` or `*.safetensors`). - 🔐 **Auth Support**: For gated models that require Huggingface login, use `--hf_username` and `--hf_token` to authenticate. - 🪞 **Mirror Site Support**: Set up with `HF_ENDPOINT` environment variable. - 🌍 **Proxy Support**: Set up with `HTTPS_PROXY` environment variable. - 📦 **Simple**: Only depend on `git`, `aria2c/wget`. ## Usage First, Download [`hfd.sh`](#file-hfd-sh) or clone this repo, and then grant execution permission to the script. @@ -20,15 +20,16 @@ chmod a+x hfd.sh ``` $ ./hfd.sh -h Usage: hfd <repo_id> [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool aria2c|wget] [-x threads] [--dataset] Description: Downloads a model or dataset from Hugging Face using the provided repo ID. Parameters: repo_id The Hugging Face repo ID in the format 'org/repo_name'. --include (Optional) Flag to specify a string pattern to include files for downloading. --exclude (Optional) Flag to specify a string pattern to exclude files from downloading. include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor'. --hf_username (Optional) Hugging Face username for authentication. --hf_token (Optional) Hugging Face token for authentication. --tool (Optional) Download tool to use. Can be wget (default) or aria2c. This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -10,16 +10,16 @@ trap 'printf "${YELLOW}\nDownload interrupted. If you re-run the command, you ca display_help() { cat << EOF Usage: hfd <repo_id> [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool aria2c|wget] [-x threads] [--dataset] Description: Downloads a model or dataset from Hugging Face using the provided repo ID. Parameters: repo_id The Hugging Face repo ID in the format 'org/repo_name'. --include (Optional) Flag to specify a string pattern to include files for downloading. --exclude (Optional) Flag to specify a string pattern to exclude files from downloading. include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor'. --hf_username (Optional) Hugging Face username for authentication. --hf_token (Optional) Hugging Face token for authentication. --tool (Optional) Download tool to use. Can be wget (default) or aria2c. @@ -38,7 +38,7 @@ MODEL_ID=$1 shift # Default values TOOL="aria2c" THREADS=1 HF_ENDPOINT=${HF_ENDPOINT:-"https://huggingface.co"} @@ -78,11 +78,11 @@ echo "Downloading to ./$MODEL_DIR" if [ -d "$MODEL_DIR/.git" ]; then printf "${YELLOW}%s exists, Skip Clone.\n${NC}" "$MODEL_DIR" cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "${RED}Git pull failed.${NC}\n"; exit 1; } else REPO_URL="$HF_ENDPOINT/$MODEL_ID" GIT_REFS_URL="${REPO_URL}/info/refs?service=git-upload-pack" echo "Testing GIT_REFS_URL: $GIT_REFS_URL" response=$(curl -s -o /dev/null -w "%{http_code}" "$GIT_REFS_URL") if [ "$response" == "401" ] || [ "$response" == "403" ]; then if [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]]; then @@ -91,7 +91,9 @@ else fi REPO_URL="https://$HF_USERNAME:$HF_TOKEN@${HF_ENDPOINT#https://}/$MODEL_ID" elif [ "$response" != "200" ]; then printf "${RED}Unexpected HTTP Status Code: $response\n${NC}" printf "${YELLOW}Executing debug command: curl -v %s\nOutput:${NC}\n" "$GIT_REFS_URL" curl -v "$GIT_REFS_URL"; printf "\n${RED}Git clone failed.\n${NC}"; exit 1 fi echo "git clone $REPO_URL" @@ -113,8 +115,8 @@ for file in $files; do download_cmd="wget -c \"$url\" -O \"$file\"" [[ -n "$HF_TOKEN" ]] && download_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\" -O \"$file\"" else download_cmd="aria2c --console-log-level=error -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\"" [[ -n "$HF_TOKEN" ]] && download_cmd="aria2c --header=\"Authorization: Bearer ${HF_TOKEN}\" --console-log-level=error -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\"" fi [[ -n "$INCLUDE_PATTERN" && $file != *"$INCLUDE_PATTERN"* ]] && printf "# %s\n" "$download_cmd" && continue [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]] && printf "# %s\n" "$download_cmd" && continue @@ -124,11 +126,12 @@ done for url_file in "${urls[@]}"; do IFS='|' read -r url file <<< "$url_file" printf "${YELLOW}Start downloading ${file}.\n${NC}" file_dir=$(dirname "$file") if [[ "$TOOL" == "wget" ]]; then [[ -n "$HF_TOKEN" ]] && wget --header="Authorization: Bearer ${HF_TOKEN}" -c "$url" -O "$file" || wget -c "$url" -O "$file" else [[ -n "$HF_TOKEN" ]] && aria2c --header="Authorization: Bearer ${HF_TOKEN}" --console-log-level=error -x $THREADS -s $THREADS -k 1M -c "$url" -d "$file_dir" -o "$(basename "$file")" || aria2c --console-log-level=error -x $THREADS -s $THREADS -k 1M -c "$url" -d "$file_dir" -o "$(basename "$file")" fi [[ $? -eq 0 ]] && printf "Downloaded %s successfully.\n" "$url" || { printf "${RED}Failed to download %s.\n${NC}" "$url"; exit 1; } done -
padeoe revised this gist
Jan 17, 2024 . No changes.There are no files selected for viewing
-
padeoe revised this gist
Dec 26, 2023 . No changes.There are no files selected for viewing
-
padeoe revised this gist
Dec 25, 2023 . 2 changed files with 5 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,12 +1,11 @@ # 🤗Huggingface Model Downloader Considering the lack of multi-threaded download support in the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli), and the inadequate error handling in [`hf_transfer`](https://github.com/huggingface/hf_transfer), this command-line tool smartly utilizes `wget` or `aria2` for LFS files and `git clone` for the rest. ## Features - ⏯️ **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. - 🚀 **Multi-threaded Download**: Utilize multiple threads to speed up the download process. - 🚫 **File Exclusion**: Use `--exclude` or `--include` to skip or specify files, save time for models with **duplicate formats** (e.g., .bin and .safetensors). - 🔐 **Auth Support**: For gated models that require Huggingface login, use `--hf_username` and `--hf_token` to authenticate. - 🪞 **Mirror Site Support**: Set up with `HF_ENDPOINT` environment variable. - 🌍 **Proxy Support**: Set up with `HTTPS_PROXY` environment variable. This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -10,13 +10,14 @@ trap 'printf "${YELLOW}\nDownload interrupted. If you re-run the command, you ca display_help() { cat << EOF Usage: hfd <model_id> [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads] [--dataset] Description: Downloads a model or dataset from Hugging Face using the provided model ID. Parameters: model_id The Hugging Face model ID in the format 'repo/model_name'. --include (Optional) Flag to specify a string pattern to include files for downloading. --exclude (Optional) Flag to specify a string pattern to exclude files from downloading. exclude_pattern The pattern to match against filenames for exclusion. --hf_username (Optional) Hugging Face username for authentication. @@ -43,6 +44,7 @@ HF_ENDPOINT=${HF_ENDPOINT:-"https://huggingface.co"} while [[ $# -gt 0 ]]; do case $1 in --include) INCLUDE_PATTERN="$2"; shift 2 ;; --exclude) EXCLUDE_PATTERN="$2"; shift 2 ;; --hf_username) HF_USERNAME="$2"; shift 2 ;; --hf_token) HF_TOKEN="$2"; shift 2 ;; @@ -114,6 +116,7 @@ for file in $files; do download_cmd="aria2c -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\"" [[ -n "$HF_TOKEN" ]] && download_cmd="aria2c --header=\"Authorization: Bearer ${HF_TOKEN}\" -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\"" fi [[ -n "$INCLUDE_PATTERN" && $file != *"$INCLUDE_PATTERN"* ]] && printf "# %s\n" "$download_cmd" && continue [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]] && printf "# %s\n" "$download_cmd" && continue printf "%s\n" "$download_cmd" urls+=("$url|$file") -
padeoe revised this gist
Dec 25, 2023 . 2 changed files with 31 additions and 23 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -43,7 +43,7 @@ Example: ``` **Download a model:** ``` ./hfd.sh bigscience/bloom-560m ``` **Download a model need login** @@ -56,7 +56,7 @@ hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token YOUR_HF_TOKE ```bash ./hfd.sh bigscience/bloom-560m --exclude safetensors ``` **Download with aria2c and multiple threads:** @@ -68,7 +68,7 @@ hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token YOUR_HF_TOKE During the download, the file URLs will be displayed: ```console $ ./hfd.sh bigscience/bloom-560m --exclude safetensors ... Start Downloading lfs files, bash script: This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,6 +1,11 @@ #!/usr/bin/env bash # Color definitions RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[1;33m' NC='\033[0m' # No Color trap 'printf "${YELLOW}\nDownload interrupted. If you re-run the command, you can resume the download from the breakpoint.\n${NC}"; exit 1' INT display_help() { cat << EOF @@ -48,13 +53,17 @@ while [[ $# -gt 0 ]]; do esac done # Check if aria2, wget, curl, git, and git-lfs are installed check_command() { if ! command -v $1 &>/dev/null; then echo -e "${RED}$1 is not installed. Please install it first.${NC}" exit 1 fi } [[ "$TOOL" == "aria2c" ]] && check_command aria2c [[ "$TOOL" == "wget" ]] && check_command wget check_command curl; check_command git; check_command git-lfs [[ -z "$MODEL_ID" || "$MODEL_ID" =~ ^-h ]] && display_help @@ -66,26 +75,25 @@ fi echo "Downloading to ./$MODEL_DIR" if [ -d "$MODEL_DIR/.git" ]; then printf "${YELLOW}%s exists, Skip Clone.\n${NC}" "$MODEL_DIR" cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "Git pull failed.\n"; exit 1; } else REPO_URL="$HF_ENDPOINT/$MODEL_ID" GIT_REFS_URL="${REPO_URL}/info/refs?service=git-upload-pack" echo "Test GIT_REFS_URL: $GIT_REFS_URL" response=$(curl -s -o /dev/null -w "%{http_code}" "$GIT_REFS_URL") if [ "$response" == "401" ] || [ "$response" == "403" ]; then if [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]]; then printf "${RED}HTTP Status Code: $response.\nThe repository requires authentication, but --hf_username and --hf_token is not passed. Please get token from https://huggingface.co/settings/tokens.\nExiting.\n${NC}" exit 1 fi REPO_URL="https://$HF_USERNAME:$HF_TOKEN@${HF_ENDPOINT#https://}/$MODEL_ID" elif [ "$response" != "200" ]; then echo -e "${RED}Unexpected HTTP Status Code: $response.\nExiting.\n${NC}"; exit 1 fi echo "git clone $REPO_URL" GIT_LFS_SKIP_SMUDGE=1 git clone "$REPO_URL" && cd "$MODEL_DIR" || { printf "${RED}Git clone failed.\n${NC}"; exit 1; } for file in $(git lfs ls-files | awk '{print $3}'); do truncate -s 0 "$file" done @@ -119,7 +127,7 @@ for url_file in "${urls[@]}"; do else [[ -n "$HF_TOKEN" ]] && aria2c --header="Authorization: Bearer ${HF_TOKEN}" -x $THREADS -s $THREADS -k 1M -c "$url" -d "$file_dir" -o "$(basename "$file")" || aria2c -x $THREADS -s $THREADS -k 1M -c "$url" -d "$file_dir" -o "$(basename "$file")" fi [[ $? -eq 0 ]] && printf "Downloaded %s successfully.\n" "$url" || { printf "${RED}Failed to download %s.\n${NC}" "$url"; exit 1; } done printf "${GREEN}Download completed successfully.\n${NC}" -
padeoe revised this gist
Nov 21, 2023 . 2 changed files with 5 additions and 6 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,7 +1,7 @@ # 🤗Huggingface Model Downloader ***Update: The previous version has a bug. When resuming from a breakpoint, there may be an issue causing incomplete files. Please update to the latest version!!!*** Considering the lack of multi-threaded download support in the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli), and the inadequate error handling in [`hf_transfer`](https://github.com/huggingface/hf_transfer), this command-line tool smartly utilizes `wget` or `aria2` for LFS files and `git clone` for the rest. ## Features - ⏯️ **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -60,21 +60,19 @@ fi MODEL_DIR="${MODEL_ID#*/}" if [[ "$DATASET" == 1 ]]; then MODEL_ID="datasets/$MODEL_ID" fi echo "Downloading to ./$MODEL_DIR" if [ -d "$MODEL_DIR/.git" ]; then printf "%s exists, Skip Clone.\n" "$MODEL_DIR" cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "Git pull failed.\n"; exit 1; } else REPO_URL="$HF_ENDPOINT/$MODEL_ID" OUTPUT=$(GIT_TERMINAL_PROMPT=0 GIT_ASKPASS="" git ls-remote "$REPO_URL" 2>&1) GIT_EXIT_CODE=$? if [[ $OUTPUT == *"could not read Username"* ]]; then if [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]]; then printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n" @@ -85,6 +83,7 @@ else elif [ $GIT_EXIT_CODE -ne 0 ]; then echo "$OUTPUT"; exit 1 fi echo "git clone $REPO_URL" GIT_LFS_SKIP_SMUDGE=1 git clone "$REPO_URL" && cd "$MODEL_DIR" || { printf "Git clone failed.\n"; exit 1; } for file in $(git lfs ls-files | awk '{print $3}'); do -
padeoe revised this gist
Nov 8, 2023 . 2 changed files with 6 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -13,7 +13,11 @@ Considering the lack of multi-threaded download support in the official [`huggin - 📦 **Simple**: No dependencies & No installation required. ## Usage First, Download [`hfd.sh`](#file-hfd-sh) or clone this repo, and then grant execution permission to the script. ```bash chmod a+x hfd.sh ``` **Usage Instructions:** ``` $ ./hfd.sh -h Usage: This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -99,7 +99,7 @@ declare -a urls for file in $files; do url="$HF_ENDPOINT/$MODEL_ID/resolve/main/$file" file_dir=$(dirname "$file") mkdir -p "$file_dir" if [[ "$TOOL" == "wget" ]]; then download_cmd="wget -c \"$url\" -O \"$file\"" [[ -n "$HF_TOKEN" ]] && download_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\" -O \"$file\"" -
padeoe revised this gist
Nov 1, 2023 . 1 changed file with 2 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,4 +1,6 @@ # 🤗Huggingface Model Downloader ***Update: The previous version has a bug. When resuming from a breakpoint, there may be an issue causing incomplete files. Please update to the latest version!!!*** Considering the lack of multi-threaded download support in the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli), and the inadequate error handling in [`hf_transfer`](https://github.com/huggingface/hf_transfer), this command-line tool smartly utilizes `wget` or `aria2` for LFS files and `git clone` for the rest. ## Features -
padeoe revised this gist
Nov 1, 2023 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -87,14 +87,16 @@ else fi GIT_LFS_SKIP_SMUDGE=1 git clone "$REPO_URL" && cd "$MODEL_DIR" || { printf "Git clone failed.\n"; exit 1; } for file in $(git lfs ls-files | awk '{print $3}'); do truncate -s 0 "$file" done fi printf "\nStart Downloading lfs files, bash script:\n" files=$(git lfs ls-files | awk '{print $3}') declare -a urls for file in $files; do url="$HF_ENDPOINT/$MODEL_ID/resolve/main/$file" file_dir=$(dirname "$file") mkdir -p "$file_dir" # 创建必要的目录 -
padeoe revised this gist
Nov 1, 2023 . 2 changed files with 33 additions and 15 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -2,8 +2,8 @@ Considering the lack of multi-threaded download support in the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli), and the inadequate error handling in [`hf_transfer`](https://github.com/huggingface/hf_transfer), this command-line tool smartly utilizes `wget` or `aria2` for LFS files and `git clone` for the rest. ## Features - ⏯️ **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. - 🚀 **Multi-threaded Download**: Utilize multiple threads to speed up the download process. - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, save time for models with **duplicate formats** (e.g., .bin and .safetensors). - 🔐 **Auth Support**: For gated models that require Huggingface login, use `--hf_username` and `--hf_token` to authenticate. - 🪞 **Mirror Site Support**: Set up with `HF_ENDPOINT` environment variable. @@ -15,10 +15,10 @@ First, Download [`hfd.sh`](#file-hfd-sh) from this repo. ``` $ ./hfd.sh -h Usage: hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads] [--dataset] Description: Downloads a model or dataset from Hugging Face using the provided model ID. Parameters: model_id The Hugging Face model ID in the format 'repo/model_name'. @@ -28,10 +28,12 @@ Parameters: --hf_token (Optional) Hugging Face token for authentication. --tool (Optional) Download tool to use. Can be wget (default) or aria2c. -x (Optional) Number of download threads for aria2c. --dataset (Optional) Flag to indicate downloading a dataset. Example: hfd bigscience/bloom-560m --exclude safetensors hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 8 hfd lavita/medical-qa-shared-task-v1-toy --dataset ``` **Download a model:** ``` This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,10 +5,10 @@ trap 'printf "\nDownload interrupted. If you re-run the command, you can resume display_help() { cat << EOF Usage: hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads] [--dataset] Description: Downloads a model or dataset from Hugging Face using the provided model ID. Parameters: model_id The Hugging Face model ID in the format 'repo/model_name'. @@ -18,10 +18,12 @@ Parameters: --hf_token (Optional) Hugging Face token for authentication. --tool (Optional) Download tool to use. Can be wget (default) or aria2c. -x (Optional) Number of download threads for aria2c. --dataset (Optional) Flag to indicate downloading a dataset. Example: hfd bigscience/bloom-560m --exclude safetensors hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 8 hfd lavita/medical-qa-shared-task-v1-toy --dataset EOF exit 1 } @@ -41,6 +43,7 @@ while [[ $# -gt 0 ]]; do --hf_token) HF_TOKEN="$2"; shift 2 ;; --tool) TOOL="$2"; shift 2 ;; -x) THREADS="$2"; shift 2 ;; --dataset) DATASET=1; shift ;; *) shift ;; esac done @@ -57,20 +60,28 @@ fi MODEL_DIR="${MODEL_ID#*/}" echo $DATASET if [[ "$DATASET" == 1 ]]; then MODEL_ID="datasets/$MODEL_ID" fi echo $MODEL_DIR if [ -d "$MODEL_DIR/.git" ]; then printf "%s exists, Skip Clone.\n" "$MODEL_DIR" cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "Git pull failed.\n"; exit 1; } else REPO_URL="$HF_ENDPOINT/$MODEL_ID" echo $REPO_URL OUTPUT=$(GIT_TERMINAL_PROMPT=0 git ls-remote "$REPO_URL" 2>&1) GIT_EXIT_CODE=$? if [[ $OUTPUT == *"could not read Username"* ]]; then if [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]]; then printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n" echo $OUTPUT exit 1 fi REPO_URL="https://$HF_USERNAME:$HF_TOKEN@${HF_ENDPOINT#https://}/$MODEL_ID" elif [ $GIT_EXIT_CODE -ne 0 ]; then echo "$OUTPUT"; exit 1 fi @@ -83,24 +94,29 @@ files=$(git lfs ls-files | awk '{print $3}') declare -a urls for file in $files; do truncate -s 0 "$file" url="$HF_ENDPOINT/$MODEL_ID/resolve/main/$file" file_dir=$(dirname "$file") mkdir -p "$file_dir" # 创建必要的目录 if [[ "$TOOL" == "wget" ]]; then download_cmd="wget -c \"$url\" -O \"$file\"" [[ -n "$HF_TOKEN" ]] && download_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\" -O \"$file\"" else download_cmd="aria2c -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\"" [[ -n "$HF_TOKEN" ]] && download_cmd="aria2c --header=\"Authorization: Bearer ${HF_TOKEN}\" -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\"" fi [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]] && printf "# %s\n" "$download_cmd" && continue printf "%s\n" "$download_cmd" urls+=("$url|$file") done for url_file in "${urls[@]}"; do IFS='|' read -r url file <<< "$url_file" file_dir=$(dirname "$file") if [[ "$TOOL" == "wget" ]]; then [[ -n "$HF_TOKEN" ]] && wget --header="Authorization: Bearer ${HF_TOKEN}" -c "$url" -O "$file" || wget -c "$url" -O "$file" else [[ -n "$HF_TOKEN" ]] && aria2c --header="Authorization: Bearer ${HF_TOKEN}" -x $THREADS -s $THREADS -k 1M -c "$url" -d "$file_dir" -o "$(basename "$file")" || aria2c -x $THREADS -s $THREADS -k 1M -c "$url" -d "$file_dir" -o "$(basename "$file")" fi [[ $? -eq 0 ]] && printf "Downloaded %s successfully.\n" "$url" || { printf "Failed to download %s.\n" "$url"; exit 1; } done -
padeoe revised this gist
Oct 27, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -53,7 +53,7 @@ hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token YOUR_HF_TOKE **Download with aria2c and multiple threads:** ```bash ./hfd.sh bigscience/bloom-560m --tool aria2c -x 4 ``` *Output*: -
padeoe revised this gist
Oct 27, 2023 . 2 changed files with 61 additions and 14 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,12 +1,12 @@ # 🤗Huggingface Model Downloader Considering the lack of multi-threaded download support in the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli), and the inadequate error handling in [`hf_transfer`](https://github.com/huggingface/hf_transfer), this command-line tool smartly utilizes `wget` or `aria2` for LFS files and `git clone` for the rest. ## Features - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. - ⏯️ **Multi-threaded Download**: Utilize multiple threads to speed up the download process. - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, save time for models with **duplicate formats** (e.g., .bin and .safetensors). - 🔐 **Auth Support**: For gated models that require Huggingface login, use `--hf_username` and `--hf_token` to authenticate. - 🪞 **Mirror Site Support**: Set up with `HF_ENDPOINT` environment variable. - 🌍 **Proxy Support**: Set up with `HTTPS_PROXY` environment variable. - 📦 **Simple**: No dependencies & No installation required. @@ -15,7 +15,23 @@ First, Download [`hfd.sh`](#file-hfd-sh) from this repo. ``` $ ./hfd.sh -h Usage: hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads] Description: Downloads a model from Hugging Face using the provided model ID. Parameters: model_id The Hugging Face model ID in the format 'repo/model_name'. --exclude (Optional) Flag to specify a string pattern to exclude files from downloading. exclude_pattern The pattern to match against filenames for exclusion. --hf_username (Optional) Hugging Face username for authentication. --hf_token (Optional) Hugging Face token for authentication. --tool (Optional) Download tool to use. Can be wget (default) or aria2c. -x (Optional) Number of download threads for aria2c. Example: hfd bigscience/bloom-560m --exclude safetensors hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 8 ``` **Download a model:** ``` @@ -35,6 +51,11 @@ hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token YOUR_HF_TOKE ./hdf.sh bigscience/bloom-560m --exclude safetensors ``` **Download with aria2c and multiple threads:** ```bash ./hfd.sh bigscience/bloom-560m --download_tool aria2c -x 4 ``` *Output*: During the download, the file URLs will be displayed: This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,7 +5,7 @@ trap 'printf "\nDownload interrupted. If you re-run the command, you can resume display_help() { cat << EOF Usage: hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads] Description: Downloads a model from Hugging Face using the provided model ID. @@ -16,26 +16,43 @@ Parameters: exclude_pattern The pattern to match against filenames for exclusion. --hf_username (Optional) Hugging Face username for authentication. --hf_token (Optional) Hugging Face token for authentication. --tool (Optional) Download tool to use. Can be wget (default) or aria2c. -x (Optional) Number of download threads for aria2c. Example: hfd bigscience/bloom-560m --exclude safetensors hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 8 EOF exit 1 } MODEL_ID=$1 shift # Default values TOOL="wget" THREADS=1 HF_ENDPOINT=${HF_ENDPOINT:-"https://huggingface.co"} while [[ $# -gt 0 ]]; do case $1 in --exclude) EXCLUDE_PATTERN="$2"; shift 2 ;; --hf_username) HF_USERNAME="$2"; shift 2 ;; --hf_token) HF_TOKEN="$2"; shift 2 ;; --tool) TOOL="$2"; shift 2 ;; -x) THREADS="$2"; shift 2 ;; *) shift ;; esac done # Check if aria2c is installed if [[ "$TOOL" == "aria2c" ]]; then if ! command -v aria2c &>/dev/null; then echo "aria2c is not installed. Installing it..." sudo apt update && sudo apt install -y aria2 || { echo "Failed to install aria2c. Exiting."; exit 1; } fi fi [[ -z "$MODEL_ID" || "$MODEL_ID" =~ ^-h ]] && display_help MODEL_DIR="${MODEL_ID#*/}" @@ -44,7 +61,7 @@ if [ -d "$MODEL_DIR/.git" ]; then printf "%s exists, Skip Clone.\n" "$MODEL_DIR" cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "Git pull failed.\n"; exit 1; } else REPO_URL="$HF_ENDPOINT/$MODEL_ID" OUTPUT=$(GIT_TERMINAL_PROMPT=0 git ls-remote "$REPO_URL" 2>&1) GIT_EXIT_CODE=$? @@ -53,7 +70,7 @@ else printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n" exit 1 fi REPO_URL="https://$HF_USERNAME:$HF_TOKEN@$HF_ENDPOINT/$MODEL_ID" elif [ $GIT_EXIT_CODE -ne 0 ]; then echo "$OUTPUT"; exit 1 fi @@ -66,16 +83,25 @@ files=$(git lfs ls-files | awk '{print $3}') declare -a urls for file in $files; do url="$HF_ENDPOINT/$MODEL_ID/resolve/main/$file" if [[ "$TOOL" == "wget" ]]; then download_cmd="wget -c \"$url\"" [[ -n "$HF_TOKEN" ]] && download_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\"" else download_cmd="aria2c -x $THREADS -s $THREADS -k 1M -c \"$url\"" [[ -n "$HF_TOKEN" ]] && download_cmd="aria2c --header=\"Authorization: Bearer ${HF_TOKEN}\" -x $THREADS -s $THREADS -k 1M -c \"$url\"" fi [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]] && printf "# %s\n" "$download_cmd" && continue printf "%s\n" "$download_cmd" urls+=("$url") done for url in "${urls[@]}"; do if [[ "$TOOL" == "wget" ]]; then [[ -n "$HF_TOKEN" ]] && wget --header="Authorization: Bearer ${HF_TOKEN}" -c "$url" || wget -c "$url" else [[ -n "$HF_TOKEN" ]] && aria2c --header="Authorization: Bearer ${HF_TOKEN}" -x $THREADS -s $THREADS -k 1M -c "$url" || aria2c -x $THREADS -s $THREADS -k 1M -c "$url" fi [[ $? -eq 0 ]] && printf "Downloaded %s successfully.\n" "$url" || { printf "Failed to download %s.\n" "$url"; exit 1; } done -
padeoe revised this gist
Oct 25, 2023 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,5 +1,7 @@ # 🤗Huggingface Model Downloader ***Update***: We recommend the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli) tool! ~This command-line tool avoids the complexity and frequent disruptions often faced with [`snapshot_download`](https://huggingface.co/docs/huggingface_hub/v0.17.3/en/package_reference/file_download#huggingface_hub.snapshot_download) and `git clone` when fetching large models, like LLM. It smartly utilizes wget(which supports resuming) for Git LFS files and git clone for the rest.~ ## Features - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. -
padeoe revised this gist
Sep 28, 2023 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,5 +1,5 @@ # 🤗Huggingface Model Downloader This command-line tool avoids the complexity and frequent disruptions often faced with [`snapshot_download`](https://huggingface.co/docs/huggingface_hub/v0.17.3/en/package_reference/file_download#huggingface_hub.snapshot_download) and `git clone` when fetching large models, like LLM. It smartly utilizes wget(which supports resuming) for Git LFS files and git clone for the rest. ## Features - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. -
padeoe revised this gist
Sep 27, 2023 . 1 changed file with 6 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -46,11 +46,15 @@ if [ -d "$MODEL_DIR/.git" ]; then else REPO_URL="https://huggingface.co/$MODEL_ID" OUTPUT=$(GIT_TERMINAL_PROMPT=0 git ls-remote "$REPO_URL" 2>&1) GIT_EXIT_CODE=$? if [[ $OUTPUT == *"could not read Username"* ]]; then if [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]]; then printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n" exit 1 fi REPO_URL="https://$HF_USERNAME:$HF_TOKEN@huggingface.co/$MODEL_ID" elif [ $GIT_EXIT_CODE -ne 0 ]; then echo "$OUTPUT"; exit 1 fi -
padeoe revised this gist
Sep 27, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,5 +1,5 @@ # 🤗Huggingface Model Downloader 🚀 This command-line tool avoids the complexity and frequent disruptions often faced with `hf_hub_download` and `git clone` when fetching large models, like LLM. It smartly utilizes wget(which supports resuming) for Git LFS files and git clone for the rest. ## Features - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. -
padeoe revised this gist
Sep 27, 2023 . No changes.There are no files selected for viewing
-
padeoe renamed this gist
Sep 27, 2023 . 1 changed file with 3 additions and 7 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,16 +1,12 @@ # 🤗Huggingface Model Downloader 🚀 This tool avoids the frequent disruptions often faced with `hf_hub_download` and `git clone` when fetching large models, like LLM. It smartly utilizes `wget`(which supports resuming) for Git LFS files and `git clone` for the rest. ## Features - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, save time for models with **duplicate formats** (e.g., .bin and .safetensors). - 🔐 **Auth Support**: For gated models that require Huggingface login, use `--hf_username` and `--hf_token` to authenticate. - 🌍 **Proxy Support**: Set up with `HTTPS_PROXY` environment variable. - 📦 **Simple**: No dependencies & No installation required. ## Usage First, Download [`hfd.sh`](#file-hfd-sh) from this repo. @@ -55,4 +51,4 @@ wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/onnx/decoder_m For easier access, you can create an alias for the script: ```bash alias hfd="$PWD/hfd.sh" ``` -
padeoe revised this gist
Sep 27, 2023 . 1 changed file with 10 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -7,30 +7,37 @@ Bypass the common **network interruptions** faced with `hf_hub_download` and `gi ## Features - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, save time for models with **duplicate formats** (e.g., .bin and .safetensors). - 🔐 **Auth Support**: For gated models that require Huggingface login, use `--hf_username` and `--hf_token` to authenticate. - 🌍 **Proxy Support**: Set up with `HTTPS_PROXY` environment variable. - 📦 **Simple**: No dependency & simple codes. ## Usage First, Download [`hfd.sh`](#file-hfd-sh) from this repo. ``` $ ./hfd.sh -h Usage: hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] ``` **Download a model:** ``` ./hdf.sh bigscience/bloom-560m ``` **Download a model need login** Get huggingface token from [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens), then ```bash hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token YOUR_HF_TOKEN ``` **Download a model and exclude certain files (e.g., .safetensors):** ```bash ./hdf.sh bigscience/bloom-560m --exclude safetensors ``` *Output*: During the download, the file URLs will be displayed: ```console -
padeoe revised this gist
Sep 27, 2023 . 1 changed file with 47 additions and 65 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,96 +1,78 @@ #!/bin/bash trap 'printf "\nDownload interrupted. If you re-run the command, you can resume the download from the breakpoint.\n"; exit 1' INT display_help() { cat << EOF Usage: hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] Description: Downloads a model from Hugging Face using the provided model ID. Parameters: model_id The Hugging Face model ID in the format 'repo/model_name'. --exclude (Optional) Flag to specify a string pattern to exclude files from downloading. exclude_pattern The pattern to match against filenames for exclusion. --hf_username (Optional) Hugging Face username for authentication. --hf_token (Optional) Hugging Face token for authentication. Example: hfd bigscience/bloom-560m --exclude safetensors hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken EOF exit 1 } MODEL_ID=$1 shift while [[ $# -gt 0 ]]; do case $1 in --exclude) EXCLUDE_PATTERN="$2"; shift 2 ;; --hf_username) HF_USERNAME="$2"; shift 2 ;; --hf_token) HF_TOKEN="$2"; shift 2 ;; *) shift ;; esac done [[ -z "$MODEL_ID" || "$MODEL_ID" =~ ^-h ]] && display_help MODEL_DIR="${MODEL_ID#*/}" if [ -d "$MODEL_DIR/.git" ]; then printf "%s exists, Skip Clone.\n" "$MODEL_DIR" cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "Git pull failed.\n"; exit 1; } else REPO_URL="https://huggingface.co/$MODEL_ID" OUTPUT=$(GIT_TERMINAL_PROMPT=0 git ls-remote "$REPO_URL" 2>&1) if [[ $OUTPUT == *"could not read Username"* ]]; then [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]] && printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n" && exit 1 REPO_URL="https://$HF_USERNAME:$HF_TOKEN@huggingface.co/$MODEL_ID" elif [ $? -ne 0 ]; then echo "$OUTPUT"; exit 1 fi GIT_LFS_SKIP_SMUDGE=1 git clone "$REPO_URL" && cd "$MODEL_DIR" || { printf "Git clone failed.\n"; exit 1; } fi printf "\nStart Downloading lfs files, bash script:\n" files=$(git lfs ls-files | awk '{print $3}') declare -a urls for file in $files; do url="https://huggingface.co/$MODEL_ID/resolve/main/$file" wget_cmd="wget -c \"$url\"" [[ -n "$HF_TOKEN" ]] && wget_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\"" [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]] && printf "# %s\n" "$wget_cmd" && continue printf "%s\n" "$wget_cmd" urls+=("$url") done for url in "${urls[@]}"; do [[ -n "$HF_TOKEN" ]] && wget --header="Authorization: Bearer ${HF_TOKEN}" -c "$url" || wget -c "$url" [[ $? -eq 0 ]] && printf "Downloaded %s successfully.\n" "$url" || { printf "Failed to download %s.\n" "$url"; exit 1; } done printf "Download completed successfully.\n" -
padeoe revised this gist
Sep 27, 2023 . No changes.There are no files selected for viewing
-
padeoe revised this gist
Sep 26, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -44,7 +44,7 @@ wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/onnx/decoder_m ... ``` ### Create an Alias for Convenience For easier access, you can create an alias for the script: ```bash alias hfd="$PWD/hfd.sh" -
padeoe revised this gist
Sep 26, 2023 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -12,7 +12,7 @@ Bypass the common **network interruptions** faced with `hf_hub_download` and `gi - 📦 **Simple**: No dependency & simple codes. ## Usage First, Download [`hfd.sh`](#file-hfd-sh) from this repo. ``` $ ./hfd.sh -h Usage: -
padeoe created this gist
Sep 26, 2023 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,51 @@ # 🤗Huggingface Model Downloader 🚀 📦 Download large Huggingface models effortlessly with the power and simplicity of **`wget`**! Bypass the common **network interruptions** faced with `hf_hub_download` and `git clone` for **large** models. This simple script leverages wget for Git LFS files and git clone for others. ## Features - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime. - 🌍 **Proxy Support**: Set up with `HTTPS_PROXY` environment variable. - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, handy for repos with **duplicate model formats** (e.g., .bin and .safetensors). - 📦 **Simple**: No dependency & simple codes. ## Usage First, Download [`hfd.sh`](#hfd-sh) from this repo. ``` $ ./hfd.sh -h Usage: hfd <model_id> [--exclude exclude_pattern] ``` **Download a model:** ``` ./hdf.sh bigscience/bloom-560m ``` **Download a model and exclude certain files (e.g., .safetensors):** ```bash ./hdf.sh bigscience/bloom-560m --exclude safetensors ``` **Output:** During the download, the file URLs will be displayed: ```console $ ./hdf.sh bigscience/bloom-560m --exclude safetensors ... Start Downloading lfs files, bash script: wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/flax_model.msgpack # wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/model.safetensors wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/onnx/decoder_model.onnx ... ``` ### 3. Create an Alias for Convenience For easier access, you can create an alias for the script: ```bash alias hfd="$PWD/hfd.sh" ``` This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,96 @@ #!/bin/bash # Trap the INT signal to handle Ctrl+C trap ctrl_c INT # Function to handle Ctrl+C ctrl_c() { printf "\nDownload interrupted. If you re-run the command, you can resume the download from the breakpoint.\n" exit 1 } # Display help information display_help() { printf "Usage:\n" printf " hfd <model_id> [--exclude exclude_pattern]\n\n" printf "Description:\n" printf " Downloads a model from Hugging Face using the provided model ID.\n\n" printf "Parameters:\n" printf " model_id The Hugging Face model ID in the format 'repo/model_name'.\n" printf " --exclude (Optional) Flag to specify a string pattern to exclude files from downloading.\n" printf " exclude_pattern The pattern to match against filenames for exclusion.\n\n" printf "Example:\n" printf " hfd bigscience/bloom-560m --exclude safetensors\n" exit 1 } MODEL_ID=$1 EXCLUDE_PATTERN='' # Parse arguments for --exclude option shift # Move to the next argument while [[ $# -gt 0 ]]; do key="$1" case $key in --exclude) EXCLUDE_PATTERN="$2" shift # past argument shift # past value ;; *) # unknown option shift # past argument ;; esac done # Check if no model_id is provided or -h/--help is provided if [[ -z "$MODEL_ID" ]] || [[ "$MODEL_ID" == "-h" ]] || [[ "$MODEL_ID" == "--help" ]]; then display_help fi MODEL_DIR=$(echo "$MODEL_ID" | awk -F'/' '{print $2}') # Check if the model directory exists and contains a .git directory if [ -d "$MODEL_DIR" ] && [ -d "$MODEL_DIR/.git" ]; then printf "%s exists, Skip Clone.\n" "$MODEL_DIR" cd "$MODEL_DIR" if GIT_LFS_SKIP_SMUDGE=1 git pull; then printf "Git pull successful.\n" else printf "Git pull failed.\n" exit 1 fi else printf "Start clone without lfs.\n" if GIT_LFS_SKIP_SMUDGE=1 git clone "https://huggingface.co/$MODEL_ID"; then cd "$MODEL_DIR" else printf "Git clone failed.\n" exit 1 fi fi printf "\nStart Downloading lfs files, bash script:\n" files=$(git lfs ls-files | awk '{print $3}') declare -a urls for file in $files; do url="https://huggingface.co/$MODEL_ID/resolve/main/$file" if [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]]; then printf "# wget -c $url\n" continue fi printf "wget -c $url\n" urls+=("$url") done for url in $urls; do if wget -c $url; then printf "Downloaded %s successfully.\n" "$url" else printf "Failed to download %s.\n" "$url" exit 1 fi done printf "Download completed successfully.\n"