Skip to content

Instantly share code, notes, and snippets.

@AlphaNext
Forked from padeoe/README_hfd.md
Created March 25, 2024 10:03
Show Gist options
  • Save AlphaNext/9cb0655832c64cae2e45f13ce0cb93a9 to your computer and use it in GitHub Desktop.
Save AlphaNext/9cb0655832c64cae2e45f13ce0cb93a9 to your computer and use it in GitHub Desktop.

Revisions

  1. padeoe revised this gist Mar 22, 2024. 1 changed file with 10 additions and 10 deletions.
    20 changes: 10 additions & 10 deletions README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -16,6 +16,12 @@ First, Download [`hfd.sh`](#file-hfd-sh) or clone this repo, and then grant exec
    ```bash
    chmod a+x hfd.sh
    ```

    you can create an alias for convenience
    ```bash
    alias hfd="$PWD/hfd.sh"
    ```

    **Usage Instructions:**
    ```
    $ ./hfd.sh -h
    @@ -44,7 +50,7 @@ Example:
    ```
    **Download a model:**
    ```
    ./hfd.sh bigscience/bloom-560m
    hfd bigscience/bloom-560m
    ```

    **Download a model need login**
    @@ -57,19 +63,19 @@ hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME_NOT_EMAIL --hf_token YO


    ```bash
    ./hfd.sh bigscience/bloom-560m --exclude *.safetensors
    hfd bigscience/bloom-560m --exclude *.safetensors
    ```

    **Download with aria2c and multiple threads:**
    ```bash
    ./hfd.sh bigscience/bloom-560m
    hfd bigscience/bloom-560m
    ```

    *Output*:
    During the download, the file URLs will be displayed:

    ```console
    $ ./hfd.sh bigscience/bloom-560m --tool wget --exclude *.safetensors
    $ hfd bigscience/bloom-560m --tool wget --exclude *.safetensors
    ...
    Start Downloading lfs files, bash script:

    @@ -78,9 +84,3 @@ wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/flax_model.msg
    wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/onnx/decoder_model.onnx
    ...
    ```

    ### Create an Alias for Convenience
    For easier access, you can create an alias for the script:
    ```bash
    alias hfd="$PWD/hfd.sh"
    ```
  2. padeoe revised this gist Mar 22, 2024. 2 changed files with 2 additions and 2 deletions.
    2 changes: 1 addition & 1 deletion README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -35,7 +35,7 @@ Parameters:
    --tool (Optional) Download tool to use. Can be aria2c (default) or wget.
    -x (Optional) Number of download threads for aria2c. Defaults to 4.
    --dataset (Optional) Flag to indicate downloading a dataset.
    --local-dir (Optional) Local directory path where the model or dataset will be stored.
    --local-dir (Optional) Local directory path where the model or dataset will be stored.
    Example:
    hfd bigscience/bloom-560m --exclude *.safetensors
    2 changes: 1 addition & 1 deletion hfd.sh
    Original file line number Diff line number Diff line change
    @@ -25,7 +25,7 @@ Parameters:
    --tool (Optional) Download tool to use. Can be aria2c (default) or wget.
    -x (Optional) Number of download threads for aria2c. Defaults to 4.
    --dataset (Optional) Flag to indicate downloading a dataset.
    --local-dir (Optional) Local directory path where the model or dataset will be stored.
    --local-dir (Optional) Local directory path where the model or dataset will be stored.
    Example:
    hfd bigscience/bloom-560m --exclude *.safetensors
  3. padeoe revised this gist Mar 22, 2024. 2 changed files with 19 additions and 14 deletions.
    7 changes: 4 additions & 3 deletions README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -20,7 +20,7 @@ chmod a+x hfd.sh
    ```
    $ ./hfd.sh -h
    Usage:
    hfd <repo_id> [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool aria2c|wget] [-x threads] [--dataset]
    hfd <repo_id> [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool aria2c|wget] [-x threads] [--dataset] [--local-dir path]
    Description:
    Downloads a model or dataset from Hugging Face using the provided repo ID.
    @@ -29,16 +29,17 @@ Parameters:
    repo_id The Hugging Face repo ID in the format 'org/repo_name'.
    --include (Optional) Flag to specify a string pattern to include files for downloading.
    --exclude (Optional) Flag to specify a string pattern to exclude files from downloading.
    include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor', '--include vae/*'
    include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor', '--include vae/*'.
    --hf_username (Optional) Hugging Face username for authentication. **NOT EMAIL**.
    --hf_token (Optional) Hugging Face token for authentication.
    --tool (Optional) Download tool to use. Can be aria2c (default) or wget.
    -x (Optional) Number of download threads for aria2c. Defaults to 4.
    --dataset (Optional) Flag to indicate downloading a dataset.
    --local-dir (Optional) Local directory path where the model or dataset will be stored.
    Example:
    hfd bigscience/bloom-560m --exclude *.safetensors
    hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 4
    hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken -x 4
    hfd lavita/medical-qa-shared-task-v1-toy --dataset
    ```
    **Download a model:**
    26 changes: 15 additions & 11 deletions hfd.sh
    Original file line number Diff line number Diff line change
    @@ -10,7 +10,7 @@ trap 'printf "${YELLOW}\nDownload interrupted. If you re-run the command, you ca
    display_help() {
    cat << EOF
    Usage:
    hfd <repo_id> [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool aria2c|wget] [-x threads] [--dataset]
    hfd <repo_id> [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool aria2c|wget] [-x threads] [--dataset] [--local-dir path]
    Description:
    Downloads a model or dataset from Hugging Face using the provided repo ID.
    @@ -19,12 +19,13 @@ Parameters:
    repo_id The Hugging Face repo ID in the format 'org/repo_name'.
    --include (Optional) Flag to specify a string pattern to include files for downloading.
    --exclude (Optional) Flag to specify a string pattern to exclude files from downloading.
    include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor'.
    include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor', '--include vae/*'.
    --hf_username (Optional) Hugging Face username for authentication. **NOT EMAIL**.
    --hf_token (Optional) Hugging Face token for authentication.
    --tool (Optional) Download tool to use. Can be wget (default) or aria2c.
    --tool (Optional) Download tool to use. Can be aria2c (default) or wget.
    -x (Optional) Number of download threads for aria2c. Defaults to 4.
    --dataset (Optional) Flag to indicate downloading a dataset.
    --local-dir (Optional) Local directory path where the model or dataset will be stored.
    Example:
    hfd bigscience/bloom-560m --exclude *.safetensors
    @@ -51,6 +52,7 @@ while [[ $# -gt 0 ]]; do
    --tool) TOOL="$2"; shift 2 ;;
    -x) THREADS="$2"; shift 2 ;;
    --dataset) DATASET=1; shift ;;
    --local-dir) LOCAL_DIR="$2"; shift 2 ;;
    *) shift ;;
    esac
    done
    @@ -69,16 +71,18 @@ check_command curl; check_command git; check_command git-lfs

    [[ -z "$MODEL_ID" || "$MODEL_ID" =~ ^-h ]] && display_help

    MODEL_DIR="${MODEL_ID#*/}"
    if [[ -z "$LOCAL_DIR" ]]; then
    LOCAL_DIR="${MODEL_ID#*/}"
    fi

    if [[ "$DATASET" == 1 ]]; then
    MODEL_ID="datasets/$MODEL_ID"
    fi
    echo "Downloading to ./$MODEL_DIR"
    echo "Downloading to $LOCAL_DIR"

    if [ -d "$MODEL_DIR/.git" ]; then
    printf "${YELLOW}%s exists, Skip Clone.\n${NC}" "$MODEL_DIR"
    cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "${RED}Git pull failed.${NC}\n"; exit 1; }
    if [ -d "$LOCAL_DIR/.git" ]; then
    printf "${YELLOW}%s exists, Skip Clone.\n${NC}" "$LOCAL_DIR"
    cd "$LOCAL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "${RED}Git pull failed.${NC}\n"; exit 1; }
    else
    REPO_URL="$HF_ENDPOINT/$MODEL_ID"
    GIT_REFS_URL="${REPO_URL}/info/refs?service=git-upload-pack"
    @@ -95,15 +99,15 @@ else
    printf "${YELLOW}Executing debug command: curl -v %s\nOutput:${NC}\n" "$GIT_REFS_URL"
    curl -v "$GIT_REFS_URL"; printf "\n${RED}Git clone failed.\n${NC}"; exit 1
    fi
    echo "git clone $REPO_URL"
    echo "git clone $REPO_URL $LOCAL_DIR"

    GIT_LFS_SKIP_SMUDGE=1 git clone "$REPO_URL" && cd "$MODEL_DIR" || { printf "${RED}Git clone failed.\n${NC}"; exit 1; }
    GIT_LFS_SKIP_SMUDGE=1 git clone $REPO_URL $LOCAL_DIR && cd "$LOCAL_DIR" || { printf "${RED}Git clone failed.\n${NC}"; exit 1; }
    for file in $(git lfs ls-files | awk '{print $3}'); do
    truncate -s 0 "$file"
    done
    fi

    printf "\nStart Downloading lfs files, bash script:\n"
    printf "\nStart Downloading lfs files, bash script:\ncd $LOCAL_DIR\n"
    files=$(git lfs ls-files | awk '{print $3}')
    declare -a urls

  4. padeoe revised this gist Mar 22, 2024. 2 changed files with 17 additions and 17 deletions.
    20 changes: 10 additions & 10 deletions README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -29,16 +29,16 @@ Parameters:
    repo_id The Hugging Face repo ID in the format 'org/repo_name'.
    --include (Optional) Flag to specify a string pattern to include files for downloading.
    --exclude (Optional) Flag to specify a string pattern to exclude files from downloading.
    include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor'.
    --hf_username (Optional) Hugging Face username for authentication.
    include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor', '--include vae/*'
    --hf_username (Optional) Hugging Face username for authentication. **NOT EMAIL**.
    --hf_token (Optional) Hugging Face token for authentication.
    --tool (Optional) Download tool to use. Can be wget (default) or aria2c.
    -x (Optional) Number of download threads for aria2c.
    --tool (Optional) Download tool to use. Can be aria2c (default) or wget.
    -x (Optional) Number of download threads for aria2c. Defaults to 4.
    --dataset (Optional) Flag to indicate downloading a dataset.
    Example:
    hfd bigscience/bloom-560m --exclude safetensors
    hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 8
    hfd bigscience/bloom-560m --exclude *.safetensors
    hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 4
    hfd lavita/medical-qa-shared-task-v1-toy --dataset
    ```
    **Download a model:**
    @@ -50,25 +50,25 @@ Example:

    Get huggingface token from [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens), then
    ```bash
    hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token YOUR_HF_TOKEN
    hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME_NOT_EMAIL --hf_token YOUR_HF_TOKEN
    ```
    **Download a model and exclude certain files (e.g., .safetensors):**


    ```bash
    ./hfd.sh bigscience/bloom-560m --exclude safetensors
    ./hfd.sh bigscience/bloom-560m --exclude *.safetensors
    ```

    **Download with aria2c and multiple threads:**
    ```bash
    ./hfd.sh bigscience/bloom-560m --tool aria2c -x 4
    ./hfd.sh bigscience/bloom-560m
    ```

    *Output*:
    During the download, the file URLs will be displayed:

    ```console
    $ ./hfd.sh bigscience/bloom-560m --exclude safetensors
    $ ./hfd.sh bigscience/bloom-560m --tool wget --exclude *.safetensors
    ...
    Start Downloading lfs files, bash script:

    14 changes: 7 additions & 7 deletions hfd.sh
    Original file line number Diff line number Diff line change
    @@ -20,15 +20,15 @@ Parameters:
    --include (Optional) Flag to specify a string pattern to include files for downloading.
    --exclude (Optional) Flag to specify a string pattern to exclude files from downloading.
    include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor'.
    --hf_username (Optional) Hugging Face username for authentication.
    --hf_username (Optional) Hugging Face username for authentication. **NOT EMAIL**.
    --hf_token (Optional) Hugging Face token for authentication.
    --tool (Optional) Download tool to use. Can be wget (default) or aria2c.
    -x (Optional) Number of download threads for aria2c.
    -x (Optional) Number of download threads for aria2c. Defaults to 4.
    --dataset (Optional) Flag to indicate downloading a dataset.
    Example:
    hfd bigscience/bloom-560m --exclude safetensors
    hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 8
    hfd bigscience/bloom-560m --exclude *.safetensors
    hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken -x 4
    hfd lavita/medical-qa-shared-task-v1-toy --dataset
    EOF
    exit 1
    @@ -39,7 +39,7 @@ shift

    # Default values
    TOOL="aria2c"
    THREADS=1
    THREADS=4
    HF_ENDPOINT=${HF_ENDPOINT:-"https://huggingface.co"}

    while [[ $# -gt 0 ]]; do
    @@ -118,8 +118,8 @@ for file in $files; do
    download_cmd="aria2c --console-log-level=error -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\""
    [[ -n "$HF_TOKEN" ]] && download_cmd="aria2c --header=\"Authorization: Bearer ${HF_TOKEN}\" --console-log-level=error -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\""
    fi
    [[ -n "$INCLUDE_PATTERN" && $file != *"$INCLUDE_PATTERN"* ]] && printf "# %s\n" "$download_cmd" && continue
    [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]] && printf "# %s\n" "$download_cmd" && continue
    [[ -n "$INCLUDE_PATTERN" && ! "$file" == $INCLUDE_PATTERN ]] && printf "# %s\n" "$download_cmd" && continue
    [[ -n "$EXCLUDE_PATTERN" && "$file" == $EXCLUDE_PATTERN ]] && printf "# %s\n" "$download_cmd" && continue
    printf "%s\n" "$download_cmd"
    urls+=("$url|$file")
    done
  5. @padeoe padeoe revised this gist Mar 20, 2024. 2 changed files with 21 additions and 17 deletions.
    13 changes: 7 additions & 6 deletions README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -5,11 +5,11 @@ Considering the lack of multi-threaded download support in the official [`huggin
    ## Features
    - ⏯️ **Resume from breakpoint**: You can re-run it or Ctrl+C anytime.
    - 🚀 **Multi-threaded Download**: Utilize multiple threads to speed up the download process.
    - 🚫 **File Exclusion**: Use `--exclude` or `--include` to skip or specify files, save time for models with **duplicate formats** (e.g., .bin and .safetensors).
    - 🚫 **File Exclusion**: Use `--exclude` or `--include` to skip or specify files, save time for models with **duplicate formats** (e.g., `*.bin` or `*.safetensors`).
    - 🔐 **Auth Support**: For gated models that require Huggingface login, use `--hf_username` and `--hf_token` to authenticate.
    - 🪞 **Mirror Site Support**: Set up with `HF_ENDPOINT` environment variable.
    - 🌍 **Proxy Support**: Set up with `HTTPS_PROXY` environment variable.
    - 📦 **Simple**: No dependencies & No installation required.
    - 📦 **Simple**: Only depend on `git`, `aria2c/wget`.

    ## Usage
    First, Download [`hfd.sh`](#file-hfd-sh) or clone this repo, and then grant execution permission to the script.
    @@ -20,15 +20,16 @@ chmod a+x hfd.sh
    ```
    $ ./hfd.sh -h
    Usage:
    hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads] [--dataset]
    hfd <repo_id> [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool aria2c|wget] [-x threads] [--dataset]
    Description:
    Downloads a model or dataset from Hugging Face using the provided model ID.
    Downloads a model or dataset from Hugging Face using the provided repo ID.
    Parameters:
    model_id The Hugging Face model ID in the format 'repo/model_name'.
    repo_id The Hugging Face repo ID in the format 'org/repo_name'.
    --include (Optional) Flag to specify a string pattern to include files for downloading.
    --exclude (Optional) Flag to specify a string pattern to exclude files from downloading.
    exclude_pattern The pattern to match against filenames for exclusion.
    include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor'.
    --hf_username (Optional) Hugging Face username for authentication.
    --hf_token (Optional) Hugging Face token for authentication.
    --tool (Optional) Download tool to use. Can be wget (default) or aria2c.
    25 changes: 14 additions & 11 deletions hfd.sh
    Original file line number Diff line number Diff line change
    @@ -10,16 +10,16 @@ trap 'printf "${YELLOW}\nDownload interrupted. If you re-run the command, you ca
    display_help() {
    cat << EOF
    Usage:
    hfd <model_id> [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads] [--dataset]
    hfd <repo_id> [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool aria2c|wget] [-x threads] [--dataset]
    Description:
    Downloads a model or dataset from Hugging Face using the provided model ID.
    Downloads a model or dataset from Hugging Face using the provided repo ID.
    Parameters:
    model_id The Hugging Face model ID in the format 'repo/model_name'.
    repo_id The Hugging Face repo ID in the format 'org/repo_name'.
    --include (Optional) Flag to specify a string pattern to include files for downloading.
    --exclude (Optional) Flag to specify a string pattern to exclude files from downloading.
    exclude_pattern The pattern to match against filenames for exclusion.
    include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor'.
    --hf_username (Optional) Hugging Face username for authentication.
    --hf_token (Optional) Hugging Face token for authentication.
    --tool (Optional) Download tool to use. Can be wget (default) or aria2c.
    @@ -38,7 +38,7 @@ MODEL_ID=$1
    shift

    # Default values
    TOOL="wget"
    TOOL="aria2c"
    THREADS=1
    HF_ENDPOINT=${HF_ENDPOINT:-"https://huggingface.co"}

    @@ -78,11 +78,11 @@ echo "Downloading to ./$MODEL_DIR"

    if [ -d "$MODEL_DIR/.git" ]; then
    printf "${YELLOW}%s exists, Skip Clone.\n${NC}" "$MODEL_DIR"
    cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "Git pull failed.\n"; exit 1; }
    cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "${RED}Git pull failed.${NC}\n"; exit 1; }
    else
    REPO_URL="$HF_ENDPOINT/$MODEL_ID"
    GIT_REFS_URL="${REPO_URL}/info/refs?service=git-upload-pack"
    echo "Test GIT_REFS_URL: $GIT_REFS_URL"
    echo "Testing GIT_REFS_URL: $GIT_REFS_URL"
    response=$(curl -s -o /dev/null -w "%{http_code}" "$GIT_REFS_URL")
    if [ "$response" == "401" ] || [ "$response" == "403" ]; then
    if [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]]; then
    @@ -91,7 +91,9 @@ else
    fi
    REPO_URL="https://$HF_USERNAME:$HF_TOKEN@${HF_ENDPOINT#https://}/$MODEL_ID"
    elif [ "$response" != "200" ]; then
    echo -e "${RED}Unexpected HTTP Status Code: $response.\nExiting.\n${NC}"; exit 1
    printf "${RED}Unexpected HTTP Status Code: $response\n${NC}"
    printf "${YELLOW}Executing debug command: curl -v %s\nOutput:${NC}\n" "$GIT_REFS_URL"
    curl -v "$GIT_REFS_URL"; printf "\n${RED}Git clone failed.\n${NC}"; exit 1
    fi
    echo "git clone $REPO_URL"

    @@ -113,8 +115,8 @@ for file in $files; do
    download_cmd="wget -c \"$url\" -O \"$file\""
    [[ -n "$HF_TOKEN" ]] && download_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\" -O \"$file\""
    else
    download_cmd="aria2c -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\""
    [[ -n "$HF_TOKEN" ]] && download_cmd="aria2c --header=\"Authorization: Bearer ${HF_TOKEN}\" -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\""
    download_cmd="aria2c --console-log-level=error -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\""
    [[ -n "$HF_TOKEN" ]] && download_cmd="aria2c --header=\"Authorization: Bearer ${HF_TOKEN}\" --console-log-level=error -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\""
    fi
    [[ -n "$INCLUDE_PATTERN" && $file != *"$INCLUDE_PATTERN"* ]] && printf "# %s\n" "$download_cmd" && continue
    [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]] && printf "# %s\n" "$download_cmd" && continue
    @@ -124,11 +126,12 @@ done

    for url_file in "${urls[@]}"; do
    IFS='|' read -r url file <<< "$url_file"
    printf "${YELLOW}Start downloading ${file}.\n${NC}"
    file_dir=$(dirname "$file")
    if [[ "$TOOL" == "wget" ]]; then
    [[ -n "$HF_TOKEN" ]] && wget --header="Authorization: Bearer ${HF_TOKEN}" -c "$url" -O "$file" || wget -c "$url" -O "$file"
    else
    [[ -n "$HF_TOKEN" ]] && aria2c --header="Authorization: Bearer ${HF_TOKEN}" -x $THREADS -s $THREADS -k 1M -c "$url" -d "$file_dir" -o "$(basename "$file")" || aria2c -x $THREADS -s $THREADS -k 1M -c "$url" -d "$file_dir" -o "$(basename "$file")"
    [[ -n "$HF_TOKEN" ]] && aria2c --header="Authorization: Bearer ${HF_TOKEN}" --console-log-level=error -x $THREADS -s $THREADS -k 1M -c "$url" -d "$file_dir" -o "$(basename "$file")" || aria2c --console-log-level=error -x $THREADS -s $THREADS -k 1M -c "$url" -d "$file_dir" -o "$(basename "$file")"
    fi
    [[ $? -eq 0 ]] && printf "Downloaded %s successfully.\n" "$url" || { printf "${RED}Failed to download %s.\n${NC}" "$url"; exit 1; }
    done
  6. @padeoe padeoe revised this gist Jan 17, 2024. No changes.
  7. @padeoe padeoe revised this gist Dec 26, 2023. No changes.
  8. padeoe revised this gist Dec 25, 2023. 2 changed files with 5 additions and 3 deletions.
    3 changes: 1 addition & 2 deletions README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -1,12 +1,11 @@
    # 🤗Huggingface Model Downloader
    ***Update: The previous version has a bug. When resuming from a breakpoint, there may be an issue causing incomplete files. Please update to the latest version!!!***

    Considering the lack of multi-threaded download support in the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli), and the inadequate error handling in [`hf_transfer`](https://github.com/huggingface/hf_transfer), this command-line tool smartly utilizes `wget` or `aria2` for LFS files and `git clone` for the rest.

    ## Features
    - ⏯️ **Resume from breakpoint**: You can re-run it or Ctrl+C anytime.
    - 🚀 **Multi-threaded Download**: Utilize multiple threads to speed up the download process.
    - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, save time for models with **duplicate formats** (e.g., .bin and .safetensors).
    - 🚫 **File Exclusion**: Use `--exclude` or `--include` to skip or specify files, save time for models with **duplicate formats** (e.g., .bin and .safetensors).
    - 🔐 **Auth Support**: For gated models that require Huggingface login, use `--hf_username` and `--hf_token` to authenticate.
    - 🪞 **Mirror Site Support**: Set up with `HF_ENDPOINT` environment variable.
    - 🌍 **Proxy Support**: Set up with `HTTPS_PROXY` environment variable.
    5 changes: 4 additions & 1 deletion hfd.sh
    Original file line number Diff line number Diff line change
    @@ -10,13 +10,14 @@ trap 'printf "${YELLOW}\nDownload interrupted. If you re-run the command, you ca
    display_help() {
    cat << EOF
    Usage:
    hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads] [--dataset]
    hfd <model_id> [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads] [--dataset]
    Description:
    Downloads a model or dataset from Hugging Face using the provided model ID.
    Parameters:
    model_id The Hugging Face model ID in the format 'repo/model_name'.
    --include (Optional) Flag to specify a string pattern to include files for downloading.
    --exclude (Optional) Flag to specify a string pattern to exclude files from downloading.
    exclude_pattern The pattern to match against filenames for exclusion.
    --hf_username (Optional) Hugging Face username for authentication.
    @@ -43,6 +44,7 @@ HF_ENDPOINT=${HF_ENDPOINT:-"https://huggingface.co"}

    while [[ $# -gt 0 ]]; do
    case $1 in
    --include) INCLUDE_PATTERN="$2"; shift 2 ;;
    --exclude) EXCLUDE_PATTERN="$2"; shift 2 ;;
    --hf_username) HF_USERNAME="$2"; shift 2 ;;
    --hf_token) HF_TOKEN="$2"; shift 2 ;;
    @@ -114,6 +116,7 @@ for file in $files; do
    download_cmd="aria2c -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\""
    [[ -n "$HF_TOKEN" ]] && download_cmd="aria2c --header=\"Authorization: Bearer ${HF_TOKEN}\" -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\""
    fi
    [[ -n "$INCLUDE_PATTERN" && $file != *"$INCLUDE_PATTERN"* ]] && printf "# %s\n" "$download_cmd" && continue
    [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]] && printf "# %s\n" "$download_cmd" && continue
    printf "%s\n" "$download_cmd"
    urls+=("$url|$file")
  9. padeoe revised this gist Dec 25, 2023. 2 changed files with 31 additions and 23 deletions.
    6 changes: 3 additions & 3 deletions README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -43,7 +43,7 @@ Example:
    ```
    **Download a model:**
    ```
    ./hdf.sh bigscience/bloom-560m
    ./hfd.sh bigscience/bloom-560m
    ```

    **Download a model need login**
    @@ -56,7 +56,7 @@ hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token YOUR_HF_TOKE


    ```bash
    ./hdf.sh bigscience/bloom-560m --exclude safetensors
    ./hfd.sh bigscience/bloom-560m --exclude safetensors
    ```

    **Download with aria2c and multiple threads:**
    @@ -68,7 +68,7 @@ hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token YOUR_HF_TOKE
    During the download, the file URLs will be displayed:

    ```console
    $ ./hdf.sh bigscience/bloom-560m --exclude safetensors
    $ ./hfd.sh bigscience/bloom-560m --exclude safetensors
    ...
    Start Downloading lfs files, bash script:

    48 changes: 28 additions & 20 deletions hfd.sh
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,11 @@
    #!/bin/bash
    #!/usr/bin/env bash
    # Color definitions
    RED='\033[0;31m'
    GREEN='\033[0;32m'
    YELLOW='\033[1;33m'
    NC='\033[0m' # No Color

    trap 'printf "\nDownload interrupted. If you re-run the command, you can resume the download from the breakpoint.\n"; exit 1' INT
    trap 'printf "${YELLOW}\nDownload interrupted. If you re-run the command, you can resume the download from the breakpoint.\n${NC}"; exit 1' INT

    display_help() {
    cat << EOF
    @@ -48,13 +53,17 @@ while [[ $# -gt 0 ]]; do
    esac
    done

    # Check if aria2c is installed
    if [[ "$TOOL" == "aria2c" ]]; then
    if ! command -v aria2c &>/dev/null; then
    echo "aria2c is not installed. Installing it..."
    sudo apt update && sudo apt install -y aria2 || { echo "Failed to install aria2c. Exiting."; exit 1; }
    # Check if aria2, wget, curl, git, and git-lfs are installed
    check_command() {
    if ! command -v $1 &>/dev/null; then
    echo -e "${RED}$1 is not installed. Please install it first.${NC}"
    exit 1
    fi
    fi
    }

    [[ "$TOOL" == "aria2c" ]] && check_command aria2c
    [[ "$TOOL" == "wget" ]] && check_command wget
    check_command curl; check_command git; check_command git-lfs

    [[ -z "$MODEL_ID" || "$MODEL_ID" =~ ^-h ]] && display_help

    @@ -66,26 +75,25 @@ fi
    echo "Downloading to ./$MODEL_DIR"

    if [ -d "$MODEL_DIR/.git" ]; then
    printf "%s exists, Skip Clone.\n" "$MODEL_DIR"
    printf "${YELLOW}%s exists, Skip Clone.\n${NC}" "$MODEL_DIR"
    cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "Git pull failed.\n"; exit 1; }
    else
    REPO_URL="$HF_ENDPOINT/$MODEL_ID"
    OUTPUT=$(GIT_TERMINAL_PROMPT=0 GIT_ASKPASS="" git ls-remote "$REPO_URL" 2>&1)
    GIT_EXIT_CODE=$?

    if [[ $OUTPUT == *"could not read Username"* ]]; then
    GIT_REFS_URL="${REPO_URL}/info/refs?service=git-upload-pack"
    echo "Test GIT_REFS_URL: $GIT_REFS_URL"
    response=$(curl -s -o /dev/null -w "%{http_code}" "$GIT_REFS_URL")
    if [ "$response" == "401" ] || [ "$response" == "403" ]; then
    if [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]]; then
    printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n"
    echo $OUTPUT
    printf "${RED}HTTP Status Code: $response.\nThe repository requires authentication, but --hf_username and --hf_token is not passed. Please get token from https://huggingface.co/settings/tokens.\nExiting.\n${NC}"
    exit 1
    fi
    REPO_URL="https://$HF_USERNAME:$HF_TOKEN@${HF_ENDPOINT#https://}/$MODEL_ID"
    elif [ $GIT_EXIT_CODE -ne 0 ]; then
    echo "$OUTPUT"; exit 1
    elif [ "$response" != "200" ]; then
    echo -e "${RED}Unexpected HTTP Status Code: $response.\nExiting.\n${NC}"; exit 1
    fi
    echo "git clone $REPO_URL"

    GIT_LFS_SKIP_SMUDGE=1 git clone "$REPO_URL" && cd "$MODEL_DIR" || { printf "Git clone failed.\n"; exit 1; }
    GIT_LFS_SKIP_SMUDGE=1 git clone "$REPO_URL" && cd "$MODEL_DIR" || { printf "${RED}Git clone failed.\n${NC}"; exit 1; }
    for file in $(git lfs ls-files | awk '{print $3}'); do
    truncate -s 0 "$file"
    done
    @@ -119,7 +127,7 @@ for url_file in "${urls[@]}"; do
    else
    [[ -n "$HF_TOKEN" ]] && aria2c --header="Authorization: Bearer ${HF_TOKEN}" -x $THREADS -s $THREADS -k 1M -c "$url" -d "$file_dir" -o "$(basename "$file")" || aria2c -x $THREADS -s $THREADS -k 1M -c "$url" -d "$file_dir" -o "$(basename "$file")"
    fi
    [[ $? -eq 0 ]] && printf "Downloaded %s successfully.\n" "$url" || { printf "Failed to download %s.\n" "$url"; exit 1; }
    [[ $? -eq 0 ]] && printf "Downloaded %s successfully.\n" "$url" || { printf "${RED}Failed to download %s.\n${NC}" "$url"; exit 1; }
    done

    printf "Download completed successfully.\n"
    printf "${GREEN}Download completed successfully.\n${NC}"
  10. padeoe revised this gist Nov 21, 2023. 2 changed files with 5 additions and 6 deletions.
    2 changes: 1 addition & 1 deletion README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -1,7 +1,7 @@
    # 🤗Huggingface Model Downloader
    ***Update: The previous version has a bug. When resuming from a breakpoint, there may be an issue causing incomplete files. Please update to the latest version!!!***

    Considering the lack of multi-threaded download support in the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli), and the inadequate error handling in [`hf_transfer`](https://github.com/huggingface/hf_transfer) this command-line tool smartly utilizes `wget` or `aria2` for LFS files and `git clone` for the rest.
    Considering the lack of multi-threaded download support in the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli), and the inadequate error handling in [`hf_transfer`](https://github.com/huggingface/hf_transfer), this command-line tool smartly utilizes `wget` or `aria2` for LFS files and `git clone` for the rest.

    ## Features
    - ⏯️ **Resume from breakpoint**: You can re-run it or Ctrl+C anytime.
    9 changes: 4 additions & 5 deletions hfd.sh
    100644 → 100755
    Original file line number Diff line number Diff line change
    @@ -60,21 +60,19 @@ fi

    MODEL_DIR="${MODEL_ID#*/}"

    echo $DATASET
    if [[ "$DATASET" == 1 ]]; then
    MODEL_ID="datasets/$MODEL_ID"
    fi
    echo $MODEL_DIR
    echo "Downloading to ./$MODEL_DIR"

    if [ -d "$MODEL_DIR/.git" ]; then
    printf "%s exists, Skip Clone.\n" "$MODEL_DIR"
    cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "Git pull failed.\n"; exit 1; }
    else
    REPO_URL="$HF_ENDPOINT/$MODEL_ID"
    echo $REPO_URL
    OUTPUT=$(GIT_TERMINAL_PROMPT=0 git ls-remote "$REPO_URL" 2>&1)
    OUTPUT=$(GIT_TERMINAL_PROMPT=0 GIT_ASKPASS="" git ls-remote "$REPO_URL" 2>&1)
    GIT_EXIT_CODE=$?

    if [[ $OUTPUT == *"could not read Username"* ]]; then
    if [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]]; then
    printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n"
    @@ -85,6 +83,7 @@ else
    elif [ $GIT_EXIT_CODE -ne 0 ]; then
    echo "$OUTPUT"; exit 1
    fi
    echo "git clone $REPO_URL"

    GIT_LFS_SKIP_SMUDGE=1 git clone "$REPO_URL" && cd "$MODEL_DIR" || { printf "Git clone failed.\n"; exit 1; }
    for file in $(git lfs ls-files | awk '{print $3}'); do
  11. @padeoe padeoe revised this gist Nov 8, 2023. 2 changed files with 6 additions and 2 deletions.
    6 changes: 5 additions & 1 deletion README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -13,7 +13,11 @@ Considering the lack of multi-threaded download support in the official [`huggin
    - 📦 **Simple**: No dependencies & No installation required.

    ## Usage
    First, Download [`hfd.sh`](#file-hfd-sh) from this repo.
    First, Download [`hfd.sh`](#file-hfd-sh) or clone this repo, and then grant execution permission to the script.
    ```bash
    chmod a+x hfd.sh
    ```
    **Usage Instructions:**
    ```
    $ ./hfd.sh -h
    Usage:
    2 changes: 1 addition & 1 deletion hfd.sh
    Original file line number Diff line number Diff line change
    @@ -99,7 +99,7 @@ declare -a urls
    for file in $files; do
    url="$HF_ENDPOINT/$MODEL_ID/resolve/main/$file"
    file_dir=$(dirname "$file")
    mkdir -p "$file_dir" # 创建必要的目录
    mkdir -p "$file_dir"
    if [[ "$TOOL" == "wget" ]]; then
    download_cmd="wget -c \"$url\" -O \"$file\""
    [[ -n "$HF_TOKEN" ]] && download_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\" -O \"$file\""
  12. @padeoe padeoe revised this gist Nov 1, 2023. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,6 @@
    # 🤗Huggingface Model Downloader
    ***Update: The previous version has a bug. When resuming from a breakpoint, there may be an issue causing incomplete files. Please update to the latest version!!!***

    Considering the lack of multi-threaded download support in the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli), and the inadequate error handling in [`hf_transfer`](https://github.com/huggingface/hf_transfer), this command-line tool smartly utilizes `wget` or `aria2` for LFS files and `git clone` for the rest.

    ## Features
  13. @padeoe padeoe revised this gist Nov 1, 2023. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion hfd.sh
    Original file line number Diff line number Diff line change
    @@ -87,14 +87,16 @@ else
    fi

    GIT_LFS_SKIP_SMUDGE=1 git clone "$REPO_URL" && cd "$MODEL_DIR" || { printf "Git clone failed.\n"; exit 1; }
    for file in $(git lfs ls-files | awk '{print $3}'); do
    truncate -s 0 "$file"
    done
    fi

    printf "\nStart Downloading lfs files, bash script:\n"
    files=$(git lfs ls-files | awk '{print $3}')
    declare -a urls

    for file in $files; do
    truncate -s 0 "$file"
    url="$HF_ENDPOINT/$MODEL_ID/resolve/main/$file"
    file_dir=$(dirname "$file")
    mkdir -p "$file_dir" # 创建必要的目录
  14. @padeoe padeoe revised this gist Nov 1, 2023. 2 changed files with 33 additions and 15 deletions.
    10 changes: 6 additions & 4 deletions README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -2,8 +2,8 @@
    Considering the lack of multi-threaded download support in the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli), and the inadequate error handling in [`hf_transfer`](https://github.com/huggingface/hf_transfer), this command-line tool smartly utilizes `wget` or `aria2` for LFS files and `git clone` for the rest.

    ## Features
    - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime.
    - ⏯️ **Multi-threaded Download**: Utilize multiple threads to speed up the download process.
    - ⏯️ **Resume from breakpoint**: You can re-run it or Ctrl+C anytime.
    - 🚀 **Multi-threaded Download**: Utilize multiple threads to speed up the download process.
    - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, save time for models with **duplicate formats** (e.g., .bin and .safetensors).
    - 🔐 **Auth Support**: For gated models that require Huggingface login, use `--hf_username` and `--hf_token` to authenticate.
    - 🪞 **Mirror Site Support**: Set up with `HF_ENDPOINT` environment variable.
    @@ -15,10 +15,10 @@ First, Download [`hfd.sh`](#file-hfd-sh) from this repo.
    ```
    $ ./hfd.sh -h
    Usage:
    hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads]
    hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads] [--dataset]
    Description:
    Downloads a model from Hugging Face using the provided model ID.
    Downloads a model or dataset from Hugging Face using the provided model ID.
    Parameters:
    model_id The Hugging Face model ID in the format 'repo/model_name'.
    @@ -28,10 +28,12 @@ Parameters:
    --hf_token (Optional) Hugging Face token for authentication.
    --tool (Optional) Download tool to use. Can be wget (default) or aria2c.
    -x (Optional) Number of download threads for aria2c.
    --dataset (Optional) Flag to indicate downloading a dataset.
    Example:
    hfd bigscience/bloom-560m --exclude safetensors
    hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 8
    hfd lavita/medical-qa-shared-task-v1-toy --dataset
    ```
    **Download a model:**
    ```
    38 changes: 27 additions & 11 deletions hfd.sh
    Original file line number Diff line number Diff line change
    @@ -5,10 +5,10 @@ trap 'printf "\nDownload interrupted. If you re-run the command, you can resume
    display_help() {
    cat << EOF
    Usage:
    hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads]
    hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads] [--dataset]
    Description:
    Downloads a model from Hugging Face using the provided model ID.
    Downloads a model or dataset from Hugging Face using the provided model ID.
    Parameters:
    model_id The Hugging Face model ID in the format 'repo/model_name'.
    @@ -18,10 +18,12 @@ Parameters:
    --hf_token (Optional) Hugging Face token for authentication.
    --tool (Optional) Download tool to use. Can be wget (default) or aria2c.
    -x (Optional) Number of download threads for aria2c.
    --dataset (Optional) Flag to indicate downloading a dataset.
    Example:
    hfd bigscience/bloom-560m --exclude safetensors
    hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 8
    hfd lavita/medical-qa-shared-task-v1-toy --dataset
    EOF
    exit 1
    }
    @@ -41,6 +43,7 @@ while [[ $# -gt 0 ]]; do
    --hf_token) HF_TOKEN="$2"; shift 2 ;;
    --tool) TOOL="$2"; shift 2 ;;
    -x) THREADS="$2"; shift 2 ;;
    --dataset) DATASET=1; shift ;;
    *) shift ;;
    esac
    done
    @@ -57,20 +60,28 @@ fi

    MODEL_DIR="${MODEL_ID#*/}"

    echo $DATASET
    if [[ "$DATASET" == 1 ]]; then
    MODEL_ID="datasets/$MODEL_ID"
    fi
    echo $MODEL_DIR

    if [ -d "$MODEL_DIR/.git" ]; then
    printf "%s exists, Skip Clone.\n" "$MODEL_DIR"
    cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "Git pull failed.\n"; exit 1; }
    else
    REPO_URL="$HF_ENDPOINT/$MODEL_ID"
    echo $REPO_URL
    OUTPUT=$(GIT_TERMINAL_PROMPT=0 git ls-remote "$REPO_URL" 2>&1)
    GIT_EXIT_CODE=$?

    if [[ $OUTPUT == *"could not read Username"* ]]; then
    if [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]]; then
    printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n"
    echo $OUTPUT
    exit 1
    fi
    REPO_URL="https://$HF_USERNAME:$HF_TOKEN@$HF_ENDPOINT/$MODEL_ID"
    REPO_URL="https://$HF_USERNAME:$HF_TOKEN@${HF_ENDPOINT#https://}/$MODEL_ID"
    elif [ $GIT_EXIT_CODE -ne 0 ]; then
    echo "$OUTPUT"; exit 1
    fi
    @@ -83,24 +94,29 @@ files=$(git lfs ls-files | awk '{print $3}')
    declare -a urls

    for file in $files; do
    truncate -s 0 "$file"
    url="$HF_ENDPOINT/$MODEL_ID/resolve/main/$file"
    file_dir=$(dirname "$file")
    mkdir -p "$file_dir" # 创建必要的目录
    if [[ "$TOOL" == "wget" ]]; then
    download_cmd="wget -c \"$url\""
    [[ -n "$HF_TOKEN" ]] && download_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\""
    download_cmd="wget -c \"$url\" -O \"$file\""
    [[ -n "$HF_TOKEN" ]] && download_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\" -O \"$file\""
    else
    download_cmd="aria2c -x $THREADS -s $THREADS -k 1M -c \"$url\""
    [[ -n "$HF_TOKEN" ]] && download_cmd="aria2c --header=\"Authorization: Bearer ${HF_TOKEN}\" -x $THREADS -s $THREADS -k 1M -c \"$url\""
    download_cmd="aria2c -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\""
    [[ -n "$HF_TOKEN" ]] && download_cmd="aria2c --header=\"Authorization: Bearer ${HF_TOKEN}\" -x $THREADS -s $THREADS -k 1M -c \"$url\" -d \"$file_dir\" -o \"$(basename "$file")\""
    fi
    [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]] && printf "# %s\n" "$download_cmd" && continue
    printf "%s\n" "$download_cmd"
    urls+=("$url")
    urls+=("$url|$file")
    done

    for url in "${urls[@]}"; do
    for url_file in "${urls[@]}"; do
    IFS='|' read -r url file <<< "$url_file"
    file_dir=$(dirname "$file")
    if [[ "$TOOL" == "wget" ]]; then
    [[ -n "$HF_TOKEN" ]] && wget --header="Authorization: Bearer ${HF_TOKEN}" -c "$url" || wget -c "$url"
    [[ -n "$HF_TOKEN" ]] && wget --header="Authorization: Bearer ${HF_TOKEN}" -c "$url" -O "$file" || wget -c "$url" -O "$file"
    else
    [[ -n "$HF_TOKEN" ]] && aria2c --header="Authorization: Bearer ${HF_TOKEN}" -x $THREADS -s $THREADS -k 1M -c "$url" || aria2c -x $THREADS -s $THREADS -k 1M -c "$url"
    [[ -n "$HF_TOKEN" ]] && aria2c --header="Authorization: Bearer ${HF_TOKEN}" -x $THREADS -s $THREADS -k 1M -c "$url" -d "$file_dir" -o "$(basename "$file")" || aria2c -x $THREADS -s $THREADS -k 1M -c "$url" -d "$file_dir" -o "$(basename "$file")"
    fi
    [[ $? -eq 0 ]] && printf "Downloaded %s successfully.\n" "$url" || { printf "Failed to download %s.\n" "$url"; exit 1; }
    done
  15. @padeoe padeoe revised this gist Oct 27, 2023. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -53,7 +53,7 @@ hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token YOUR_HF_TOKE

    **Download with aria2c and multiple threads:**
    ```bash
    ./hfd.sh bigscience/bloom-560m --download_tool aria2c -x 4
    ./hfd.sh bigscience/bloom-560m --tool aria2c -x 4
    ```

    *Output*:
  16. @padeoe padeoe revised this gist Oct 27, 2023. 2 changed files with 61 additions and 14 deletions.
    29 changes: 25 additions & 4 deletions README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -1,12 +1,12 @@
    # 🤗Huggingface Model Downloader
    ***Update***: We recommend the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli) tool!

    ~This command-line tool avoids the complexity and frequent disruptions often faced with [`snapshot_download`](https://huggingface.co/docs/huggingface_hub/v0.17.3/en/package_reference/file_download#huggingface_hub.snapshot_download) and `git clone` when fetching large models, like LLM. It smartly utilizes wget(which supports resuming) for Git LFS files and git clone for the rest.~
    Considering the lack of multi-threaded download support in the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli), and the inadequate error handling in [`hf_transfer`](https://github.com/huggingface/hf_transfer), this command-line tool smartly utilizes `wget` or `aria2` for LFS files and `git clone` for the rest.

    ## Features
    - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime.
    - ⏯️ **Multi-threaded Download**: Utilize multiple threads to speed up the download process.
    - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, save time for models with **duplicate formats** (e.g., .bin and .safetensors).
    - 🔐 **Auth Support**: For gated models that require Huggingface login, use `--hf_username` and `--hf_token` to authenticate.
    - 🪞 **Mirror Site Support**: Set up with `HF_ENDPOINT` environment variable.
    - 🌍 **Proxy Support**: Set up with `HTTPS_PROXY` environment variable.
    - 📦 **Simple**: No dependencies & No installation required.

    @@ -15,7 +15,23 @@ First, Download [`hfd.sh`](#file-hfd-sh) from this repo.
    ```
    $ ./hfd.sh -h
    Usage:
    hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token]
    hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads]
    Description:
    Downloads a model from Hugging Face using the provided model ID.
    Parameters:
    model_id The Hugging Face model ID in the format 'repo/model_name'.
    --exclude (Optional) Flag to specify a string pattern to exclude files from downloading.
    exclude_pattern The pattern to match against filenames for exclusion.
    --hf_username (Optional) Hugging Face username for authentication.
    --hf_token (Optional) Hugging Face token for authentication.
    --tool (Optional) Download tool to use. Can be wget (default) or aria2c.
    -x (Optional) Number of download threads for aria2c.
    Example:
    hfd bigscience/bloom-560m --exclude safetensors
    hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 8
    ```
    **Download a model:**
    ```
    @@ -35,6 +51,11 @@ hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token YOUR_HF_TOKE
    ./hdf.sh bigscience/bloom-560m --exclude safetensors
    ```

    **Download with aria2c and multiple threads:**
    ```bash
    ./hfd.sh bigscience/bloom-560m --download_tool aria2c -x 4
    ```

    *Output*:
    During the download, the file URLs will be displayed:

    46 changes: 36 additions & 10 deletions hfd.sh
    Original file line number Diff line number Diff line change
    @@ -5,7 +5,7 @@ trap 'printf "\nDownload interrupted. If you re-run the command, you can resume
    display_help() {
    cat << EOF
    Usage:
    hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token]
    hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads]
    Description:
    Downloads a model from Hugging Face using the provided model ID.
    @@ -16,26 +16,43 @@ Parameters:
    exclude_pattern The pattern to match against filenames for exclusion.
    --hf_username (Optional) Hugging Face username for authentication.
    --hf_token (Optional) Hugging Face token for authentication.
    --tool (Optional) Download tool to use. Can be wget (default) or aria2c.
    -x (Optional) Number of download threads for aria2c.
    Example:
    hfd bigscience/bloom-560m --exclude safetensors
    hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken
    hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 8
    EOF
    exit 1
    }

    MODEL_ID=$1
    shift

    # Default values
    TOOL="wget"
    THREADS=1
    HF_ENDPOINT=${HF_ENDPOINT:-"https://huggingface.co"}

    while [[ $# -gt 0 ]]; do
    case $1 in
    --exclude) EXCLUDE_PATTERN="$2"; shift 2 ;;
    --hf_username) HF_USERNAME="$2"; shift 2 ;;
    --hf_token) HF_TOKEN="$2"; shift 2 ;;
    --tool) TOOL="$2"; shift 2 ;;
    -x) THREADS="$2"; shift 2 ;;
    *) shift ;;
    esac
    done

    # Check if aria2c is installed
    if [[ "$TOOL" == "aria2c" ]]; then
    if ! command -v aria2c &>/dev/null; then
    echo "aria2c is not installed. Installing it..."
    sudo apt update && sudo apt install -y aria2 || { echo "Failed to install aria2c. Exiting."; exit 1; }
    fi
    fi

    [[ -z "$MODEL_ID" || "$MODEL_ID" =~ ^-h ]] && display_help

    MODEL_DIR="${MODEL_ID#*/}"
    @@ -44,7 +61,7 @@ if [ -d "$MODEL_DIR/.git" ]; then
    printf "%s exists, Skip Clone.\n" "$MODEL_DIR"
    cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "Git pull failed.\n"; exit 1; }
    else
    REPO_URL="https://huggingface.co/$MODEL_ID"
    REPO_URL="$HF_ENDPOINT/$MODEL_ID"
    OUTPUT=$(GIT_TERMINAL_PROMPT=0 git ls-remote "$REPO_URL" 2>&1)
    GIT_EXIT_CODE=$?

    @@ -53,7 +70,7 @@ else
    printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n"
    exit 1
    fi
    REPO_URL="https://$HF_USERNAME:$HF_TOKEN@huggingface.co/$MODEL_ID"
    REPO_URL="https://$HF_USERNAME:$HF_TOKEN@$HF_ENDPOINT/$MODEL_ID"
    elif [ $GIT_EXIT_CODE -ne 0 ]; then
    echo "$OUTPUT"; exit 1
    fi
    @@ -66,16 +83,25 @@ files=$(git lfs ls-files | awk '{print $3}')
    declare -a urls

    for file in $files; do
    url="https://huggingface.co/$MODEL_ID/resolve/main/$file"
    wget_cmd="wget -c \"$url\""
    [[ -n "$HF_TOKEN" ]] && wget_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\""
    [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]] && printf "# %s\n" "$wget_cmd" && continue
    printf "%s\n" "$wget_cmd"
    url="$HF_ENDPOINT/$MODEL_ID/resolve/main/$file"
    if [[ "$TOOL" == "wget" ]]; then
    download_cmd="wget -c \"$url\""
    [[ -n "$HF_TOKEN" ]] && download_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\""
    else
    download_cmd="aria2c -x $THREADS -s $THREADS -k 1M -c \"$url\""
    [[ -n "$HF_TOKEN" ]] && download_cmd="aria2c --header=\"Authorization: Bearer ${HF_TOKEN}\" -x $THREADS -s $THREADS -k 1M -c \"$url\""
    fi
    [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]] && printf "# %s\n" "$download_cmd" && continue
    printf "%s\n" "$download_cmd"
    urls+=("$url")
    done

    for url in "${urls[@]}"; do
    [[ -n "$HF_TOKEN" ]] && wget --header="Authorization: Bearer ${HF_TOKEN}" -c "$url" || wget -c "$url"
    if [[ "$TOOL" == "wget" ]]; then
    [[ -n "$HF_TOKEN" ]] && wget --header="Authorization: Bearer ${HF_TOKEN}" -c "$url" || wget -c "$url"
    else
    [[ -n "$HF_TOKEN" ]] && aria2c --header="Authorization: Bearer ${HF_TOKEN}" -x $THREADS -s $THREADS -k 1M -c "$url" || aria2c -x $THREADS -s $THREADS -k 1M -c "$url"
    fi
    [[ $? -eq 0 ]] && printf "Downloaded %s successfully.\n" "$url" || { printf "Failed to download %s.\n" "$url"; exit 1; }
    done

  17. @padeoe padeoe revised this gist Oct 25, 2023. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,7 @@
    # 🤗Huggingface Model Downloader
    This command-line tool avoids the complexity and frequent disruptions often faced with [`snapshot_download`](https://huggingface.co/docs/huggingface_hub/v0.17.3/en/package_reference/file_download#huggingface_hub.snapshot_download) and `git clone` when fetching large models, like LLM. It smartly utilizes wget(which supports resuming) for Git LFS files and git clone for the rest.
    ***Update***: We recommend the official [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli) tool!

    ~This command-line tool avoids the complexity and frequent disruptions often faced with [`snapshot_download`](https://huggingface.co/docs/huggingface_hub/v0.17.3/en/package_reference/file_download#huggingface_hub.snapshot_download) and `git clone` when fetching large models, like LLM. It smartly utilizes wget(which supports resuming) for Git LFS files and git clone for the rest.~

    ## Features
    - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime.
  18. @padeoe padeoe revised this gist Sep 28, 2023. 1 changed file with 2 additions and 2 deletions.
    4 changes: 2 additions & 2 deletions README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,5 @@
    # 🤗Huggingface Model Downloader 🚀
    This command-line tool avoids the complexity and frequent disruptions often faced with `hf_hub_download` and `git clone` when fetching large models, like LLM. It smartly utilizes wget(which supports resuming) for Git LFS files and git clone for the rest.
    # 🤗Huggingface Model Downloader
    This command-line tool avoids the complexity and frequent disruptions often faced with [`snapshot_download`](https://huggingface.co/docs/huggingface_hub/v0.17.3/en/package_reference/file_download#huggingface_hub.snapshot_download) and `git clone` when fetching large models, like LLM. It smartly utilizes wget(which supports resuming) for Git LFS files and git clone for the rest.

    ## Features
    - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime.
  19. @padeoe padeoe revised this gist Sep 27, 2023. 1 changed file with 6 additions and 2 deletions.
    8 changes: 6 additions & 2 deletions hfd.sh
    Original file line number Diff line number Diff line change
    @@ -46,11 +46,15 @@ if [ -d "$MODEL_DIR/.git" ]; then
    else
    REPO_URL="https://huggingface.co/$MODEL_ID"
    OUTPUT=$(GIT_TERMINAL_PROMPT=0 git ls-remote "$REPO_URL" 2>&1)
    GIT_EXIT_CODE=$?

    if [[ $OUTPUT == *"could not read Username"* ]]; then
    [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]] && printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n" && exit 1
    if [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]]; then
    printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n"
    exit 1
    fi
    REPO_URL="https://$HF_USERNAME:$HF_TOKEN@huggingface.co/$MODEL_ID"
    elif [ $? -ne 0 ]; then
    elif [ $GIT_EXIT_CODE -ne 0 ]; then
    echo "$OUTPUT"; exit 1
    fi

  20. @padeoe padeoe revised this gist Sep 27, 2023. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -1,5 +1,5 @@
    # 🤗Huggingface Model Downloader 🚀
    This tool avoids the frequent disruptions often faced with `hf_hub_download` and `git clone` when fetching large models, like LLM. It smartly utilizes `wget`(which supports resuming) for Git LFS files and `git clone` for the rest.
    This command-line tool avoids the complexity and frequent disruptions often faced with `hf_hub_download` and `git clone` when fetching large models, like LLM. It smartly utilizes wget(which supports resuming) for Git LFS files and git clone for the rest.

    ## Features
    - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime.
  21. @padeoe padeoe revised this gist Sep 27, 2023. No changes.
  22. padeoe renamed this gist Sep 27, 2023. 1 changed file with 3 additions and 7 deletions.
    10 changes: 3 additions & 7 deletions README_huggingface_model_downloader.md → README_hfd.md
    Original file line number Diff line number Diff line change
    @@ -1,16 +1,12 @@
    # 🤗Huggingface Model Downloader 🚀
    📦 Download large Huggingface models effortlessly with the power and simplicity of **`wget`**!

    Bypass the common **network interruptions** faced with `hf_hub_download` and `git clone` for **large** models. This simple script leverages wget for Git LFS files and git clone for others.


    This tool avoids the frequent disruptions often faced with `hf_hub_download` and `git clone` when fetching large models, like LLM. It smartly utilizes `wget`(which supports resuming) for Git LFS files and `git clone` for the rest.

    ## Features
    - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime.
    - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, save time for models with **duplicate formats** (e.g., .bin and .safetensors).
    - 🔐 **Auth Support**: For gated models that require Huggingface login, use `--hf_username` and `--hf_token` to authenticate.
    - 🌍 **Proxy Support**: Set up with `HTTPS_PROXY` environment variable.
    - 📦 **Simple**: No dependency & simple codes.
    - 📦 **Simple**: No dependencies & No installation required.

    ## Usage
    First, Download [`hfd.sh`](#file-hfd-sh) from this repo.
    @@ -55,4 +51,4 @@ wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/onnx/decoder_m
    For easier access, you can create an alias for the script:
    ```bash
    alias hfd="$PWD/hfd.sh"
    ```
    ```
  23. @padeoe padeoe revised this gist Sep 27, 2023. 1 changed file with 10 additions and 3 deletions.
    13 changes: 10 additions & 3 deletions README_huggingface_model_downloader.md
    Original file line number Diff line number Diff line change
    @@ -7,30 +7,37 @@ Bypass the common **network interruptions** faced with `hf_hub_download` and `gi

    ## Features
    - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime.
    - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, save time for models with **duplicate formats** (e.g., .bin and .safetensors).
    - 🔐 **Auth Support**: For gated models that require Huggingface login, use `--hf_username` and `--hf_token` to authenticate.
    - 🌍 **Proxy Support**: Set up with `HTTPS_PROXY` environment variable.
    - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, handy for repos with **duplicate model formats** (e.g., .bin and .safetensors).
    - 📦 **Simple**: No dependency & simple codes.

    ## Usage
    First, Download [`hfd.sh`](#file-hfd-sh) from this repo.
    ```
    $ ./hfd.sh -h
    Usage:
    hfd <model_id> [--exclude exclude_pattern]
    hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token]
    ```
    **Download a model:**
    ```
    ./hdf.sh bigscience/bloom-560m
    ```

    **Download a model need login**

    Get huggingface token from [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens), then
    ```bash
    hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token YOUR_HF_TOKEN
    ```
    **Download a model and exclude certain files (e.g., .safetensors):**


    ```bash
    ./hdf.sh bigscience/bloom-560m --exclude safetensors
    ```

    **Output:**
    *Output*:
    During the download, the file URLs will be displayed:

    ```console
  24. @padeoe padeoe revised this gist Sep 27, 2023. 1 changed file with 47 additions and 65 deletions.
    112 changes: 47 additions & 65 deletions hfd.sh
    Original file line number Diff line number Diff line change
    @@ -1,96 +1,78 @@
    #!/bin/bash

    # Trap the INT signal to handle Ctrl+C
    trap ctrl_c INT
    trap 'printf "\nDownload interrupted. If you re-run the command, you can resume the download from the breakpoint.\n"; exit 1' INT

    # Function to handle Ctrl+C
    ctrl_c() {
    printf "\nDownload interrupted. If you re-run the command, you can resume the download from the breakpoint.\n"
    exit 1
    }

    # Display help information
    display_help() {
    printf "Usage:\n"
    printf " hfd <model_id> [--exclude exclude_pattern]\n\n"
    printf "Description:\n"
    printf " Downloads a model from Hugging Face using the provided model ID.\n\n"
    printf "Parameters:\n"
    printf " model_id The Hugging Face model ID in the format 'repo/model_name'.\n"
    printf " --exclude (Optional) Flag to specify a string pattern to exclude files from downloading.\n"
    printf " exclude_pattern The pattern to match against filenames for exclusion.\n\n"
    printf "Example:\n"
    printf " hfd bigscience/bloom-560m --exclude safetensors\n"
    cat << EOF
    Usage:
    hfd <model_id> [--exclude exclude_pattern] [--hf_username username] [--hf_token token]
    Description:
    Downloads a model from Hugging Face using the provided model ID.
    Parameters:
    model_id The Hugging Face model ID in the format 'repo/model_name'.
    --exclude (Optional) Flag to specify a string pattern to exclude files from downloading.
    exclude_pattern The pattern to match against filenames for exclusion.
    --hf_username (Optional) Hugging Face username for authentication.
    --hf_token (Optional) Hugging Face token for authentication.
    Example:
    hfd bigscience/bloom-560m --exclude safetensors
    hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken
    EOF
    exit 1
    }

    MODEL_ID=$1
    EXCLUDE_PATTERN=''
    shift

    # Parse arguments for --exclude option
    shift # Move to the next argument
    while [[ $# -gt 0 ]]; do
    key="$1"
    case $key in
    --exclude)
    EXCLUDE_PATTERN="$2"
    shift # past argument
    shift # past value
    ;;
    *)
    # unknown option
    shift # past argument
    ;;
    case $1 in
    --exclude) EXCLUDE_PATTERN="$2"; shift 2 ;;
    --hf_username) HF_USERNAME="$2"; shift 2 ;;
    --hf_token) HF_TOKEN="$2"; shift 2 ;;
    *) shift ;;
    esac
    done

    # Check if no model_id is provided or -h/--help is provided
    if [[ -z "$MODEL_ID" ]] || [[ "$MODEL_ID" == "-h" ]] || [[ "$MODEL_ID" == "--help" ]]; then
    display_help
    fi
    [[ -z "$MODEL_ID" || "$MODEL_ID" =~ ^-h ]] && display_help

    MODEL_DIR=$(echo "$MODEL_ID" | awk -F'/' '{print $2}')
    MODEL_DIR="${MODEL_ID#*/}"

    # Check if the model directory exists and contains a .git directory
    if [ -d "$MODEL_DIR" ] && [ -d "$MODEL_DIR/.git" ]; then
    if [ -d "$MODEL_DIR/.git" ]; then
    printf "%s exists, Skip Clone.\n" "$MODEL_DIR"
    cd "$MODEL_DIR"
    if GIT_LFS_SKIP_SMUDGE=1 git pull; then
    printf "Git pull successful.\n"
    else
    printf "Git pull failed.\n"
    exit 1
    fi
    cd "$MODEL_DIR" && GIT_LFS_SKIP_SMUDGE=1 git pull || { printf "Git pull failed.\n"; exit 1; }
    else
    printf "Start clone without lfs.\n"
    if GIT_LFS_SKIP_SMUDGE=1 git clone "https://huggingface.co/$MODEL_ID"; then
    cd "$MODEL_DIR"
    else
    printf "Git clone failed.\n"
    exit 1
    REPO_URL="https://huggingface.co/$MODEL_ID"
    OUTPUT=$(GIT_TERMINAL_PROMPT=0 git ls-remote "$REPO_URL" 2>&1)

    if [[ $OUTPUT == *"could not read Username"* ]]; then
    [[ -z "$HF_USERNAME" || -z "$HF_TOKEN" ]] && printf "The repository requires authentication, but --hf_username and --hf_token is not passed.\nPlease get token from https://huggingface.co/settings/tokens.\nExiting.\n" && exit 1
    REPO_URL="https://$HF_USERNAME:$HF_TOKEN@huggingface.co/$MODEL_ID"
    elif [ $? -ne 0 ]; then
    echo "$OUTPUT"; exit 1
    fi

    GIT_LFS_SKIP_SMUDGE=1 git clone "$REPO_URL" && cd "$MODEL_DIR" || { printf "Git clone failed.\n"; exit 1; }
    fi

    printf "\nStart Downloading lfs files, bash script:\n"
    files=$(git lfs ls-files | awk '{print $3}')
    declare -a urls

    for file in $files; do
    url="https://huggingface.co/$MODEL_ID/resolve/main/$file"
    if [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]]; then
    printf "# wget -c $url\n"
    continue
    fi
    printf "wget -c $url\n"
    wget_cmd="wget -c \"$url\""
    [[ -n "$HF_TOKEN" ]] && wget_cmd="wget --header=\"Authorization: Bearer ${HF_TOKEN}\" -c \"$url\""
    [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]] && printf "# %s\n" "$wget_cmd" && continue
    printf "%s\n" "$wget_cmd"
    urls+=("$url")
    done

    for url in $urls; do
    if wget -c $url; then
    printf "Downloaded %s successfully.\n" "$url"
    else
    printf "Failed to download %s.\n" "$url"
    exit 1
    fi
    for url in "${urls[@]}"; do
    [[ -n "$HF_TOKEN" ]] && wget --header="Authorization: Bearer ${HF_TOKEN}" -c "$url" || wget -c "$url"
    [[ $? -eq 0 ]] && printf "Downloaded %s successfully.\n" "$url" || { printf "Failed to download %s.\n" "$url"; exit 1; }
    done

    printf "Download completed successfully.\n"
  25. @padeoe padeoe revised this gist Sep 27, 2023. No changes.
  26. @padeoe padeoe revised this gist Sep 26, 2023. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion README_huggingface_model_downloader.md
    Original file line number Diff line number Diff line change
    @@ -44,7 +44,7 @@ wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/onnx/decoder_m
    ...
    ```

    ### 3. Create an Alias for Convenience
    ### Create an Alias for Convenience
    For easier access, you can create an alias for the script:
    ```bash
    alias hfd="$PWD/hfd.sh"
  27. @padeoe padeoe revised this gist Sep 26, 2023. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion README_huggingface_model_downloader.md
    Original file line number Diff line number Diff line change
    @@ -12,7 +12,7 @@ Bypass the common **network interruptions** faced with `hf_hub_download` and `gi
    - 📦 **Simple**: No dependency & simple codes.

    ## Usage
    First, Download [`hfd.sh`](#hfd-sh) from this repo.
    First, Download [`hfd.sh`](#file-hfd-sh) from this repo.
    ```
    $ ./hfd.sh -h
    Usage:
  28. @padeoe padeoe created this gist Sep 26, 2023.
    51 changes: 51 additions & 0 deletions README_huggingface_model_downloader.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,51 @@
    # 🤗Huggingface Model Downloader 🚀
    📦 Download large Huggingface models effortlessly with the power and simplicity of **`wget`**!

    Bypass the common **network interruptions** faced with `hf_hub_download` and `git clone` for **large** models. This simple script leverages wget for Git LFS files and git clone for others.



    ## Features
    - 🚀 **Resume from breakpoint**: You can re-run it or Ctrl+C anytime.
    - 🌍 **Proxy Support**: Set up with `HTTPS_PROXY` environment variable.
    - 🚫 **File Exclusion**: Use `--exclude` to skip specific files, handy for repos with **duplicate model formats** (e.g., .bin and .safetensors).
    - 📦 **Simple**: No dependency & simple codes.

    ## Usage
    First, Download [`hfd.sh`](#hfd-sh) from this repo.
    ```
    $ ./hfd.sh -h
    Usage:
    hfd <model_id> [--exclude exclude_pattern]
    ```
    **Download a model:**
    ```
    ./hdf.sh bigscience/bloom-560m
    ```

    **Download a model and exclude certain files (e.g., .safetensors):**


    ```bash
    ./hdf.sh bigscience/bloom-560m --exclude safetensors
    ```

    **Output:**
    During the download, the file URLs will be displayed:

    ```console
    $ ./hdf.sh bigscience/bloom-560m --exclude safetensors
    ...
    Start Downloading lfs files, bash script:

    wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/flax_model.msgpack
    # wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/model.safetensors
    wget -c https://huggingface.co/bigscience/bloom-560m/resolve/main/onnx/decoder_model.onnx
    ...
    ```

    ### 3. Create an Alias for Convenience
    For easier access, you can create an alias for the script:
    ```bash
    alias hfd="$PWD/hfd.sh"
    ```
    96 changes: 96 additions & 0 deletions hfd.sh
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,96 @@
    #!/bin/bash

    # Trap the INT signal to handle Ctrl+C
    trap ctrl_c INT

    # Function to handle Ctrl+C
    ctrl_c() {
    printf "\nDownload interrupted. If you re-run the command, you can resume the download from the breakpoint.\n"
    exit 1
    }

    # Display help information
    display_help() {
    printf "Usage:\n"
    printf " hfd <model_id> [--exclude exclude_pattern]\n\n"
    printf "Description:\n"
    printf " Downloads a model from Hugging Face using the provided model ID.\n\n"
    printf "Parameters:\n"
    printf " model_id The Hugging Face model ID in the format 'repo/model_name'.\n"
    printf " --exclude (Optional) Flag to specify a string pattern to exclude files from downloading.\n"
    printf " exclude_pattern The pattern to match against filenames for exclusion.\n\n"
    printf "Example:\n"
    printf " hfd bigscience/bloom-560m --exclude safetensors\n"
    exit 1
    }

    MODEL_ID=$1
    EXCLUDE_PATTERN=''

    # Parse arguments for --exclude option
    shift # Move to the next argument
    while [[ $# -gt 0 ]]; do
    key="$1"
    case $key in
    --exclude)
    EXCLUDE_PATTERN="$2"
    shift # past argument
    shift # past value
    ;;
    *)
    # unknown option
    shift # past argument
    ;;
    esac
    done

    # Check if no model_id is provided or -h/--help is provided
    if [[ -z "$MODEL_ID" ]] || [[ "$MODEL_ID" == "-h" ]] || [[ "$MODEL_ID" == "--help" ]]; then
    display_help
    fi

    MODEL_DIR=$(echo "$MODEL_ID" | awk -F'/' '{print $2}')

    # Check if the model directory exists and contains a .git directory
    if [ -d "$MODEL_DIR" ] && [ -d "$MODEL_DIR/.git" ]; then
    printf "%s exists, Skip Clone.\n" "$MODEL_DIR"
    cd "$MODEL_DIR"
    if GIT_LFS_SKIP_SMUDGE=1 git pull; then
    printf "Git pull successful.\n"
    else
    printf "Git pull failed.\n"
    exit 1
    fi
    else
    printf "Start clone without lfs.\n"
    if GIT_LFS_SKIP_SMUDGE=1 git clone "https://huggingface.co/$MODEL_ID"; then
    cd "$MODEL_DIR"
    else
    printf "Git clone failed.\n"
    exit 1
    fi
    fi

    printf "\nStart Downloading lfs files, bash script:\n"
    files=$(git lfs ls-files | awk '{print $3}')
    declare -a urls
    for file in $files; do
    url="https://huggingface.co/$MODEL_ID/resolve/main/$file"
    if [[ -n "$EXCLUDE_PATTERN" && $file == *"$EXCLUDE_PATTERN"* ]]; then
    printf "# wget -c $url\n"
    continue
    fi
    printf "wget -c $url\n"
    urls+=("$url")
    done

    for url in $urls; do
    if wget -c $url; then
    printf "Downloaded %s successfully.\n" "$url"
    else
    printf "Failed to download %s.\n" "$url"
    exit 1
    fi
    done

    printf "Download completed successfully.\n"