Skip to content

Instantly share code, notes, and snippets.

@pressdarling
Forked from electblake/GitHubFileDownloader.sh
Last active January 31, 2025 07:42
Show Gist options
  • Save pressdarling/ef691eb09202df0b739adec688a72d9a to your computer and use it in GitHub Desktop.
Save pressdarling/ef691eb09202df0b739adec688a72d9a to your computer and use it in GitHub Desktop.

Revisions

  1. pressdarling revised this gist Jan 31, 2025. 1 changed file with 157 additions and 44 deletions.
    201 changes: 157 additions & 44 deletions GitHubFileDownloader.sh
    Original file line number Diff line number Diff line change
    @@ -1,13 +1,13 @@
    #!/bin/bash

    # Script: GitHubFileDownloader.sh
    # Author: electblake <https://github.com/electblake>
    # Version: 1.0
    # Description: This script converts a GitHub repository file URL to its GitHub API URL for file contents, checks if the user is logged into GitHub CLI, and downloads the file.
    # Author: electblake <https://github.com/electblake>, pressdarling <https://github.com/pressdarling>
    # Version: 1.1
    # Description: This script converts one or more GitHub repository file URLs to its GitHub API URL for file contents, checks if the user is logged into GitHub CLI, and downloads the file.
    # Source: https://gist.github.com/electblake/7ef3a63e20b3c8db67d9d66f7021d727
    # Credits:
    # - Inspired by answers on: https://stackoverflow.com/questions/9159894/download-specific-files-from-github-in-command-line-not-clone-the-entire-repo
    # - Used "Bash Script" GPT by Widenex for script creation assistance.
    # - v1 used "Bash Script" GPT by Widenex for script creation assistance.
    #
    # MIT License
    # Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted.
    @@ -16,38 +16,98 @@
    #
    # Requires: jq, curl, and GitHub CLI (gh) installed and configured.
    #
    # **Installation:**
    # 1. Download this file using the old fashioned way: Click the download icon on github.com
    # 2. Make script executable: chmod +x ./GitHubFileDownloader.sh
    # 3. Place this script in a directory included in your $PATH, e.g., /usr/local/bin (macOS El Capitan or newer) or $HOME/bin.
    # ```bash
    # # assuming macOS, if you're using Linux then it's probably an `apt` job
    # brew install jq curl gh
    # ```
    #
    # *Bonus Alternative* step is to rename to command `gh-dl` and place in $PATH like:
    # **Installation:**
    # 1. Download this file however you like (Click the download icon on github.com, or https://gist.github.com/ef691eb09202df0b739adec688a72d9a.git...)
    # 2. Make script executable: `chmod +x ./GitHubFileDownloader.sh`
    # 3. Move or link it into your path, bonus points if you rename to `gh-dl` for easy typing, e.g.:
    #
    # ```bash
    # mv ./GithubFileDownloader.sh $HOME/bin/gh-dl
    #
    # mv -i $(PWD)/GithubFileDownloader.sh /usr/local/bin/gh-dl
    #
    # # or for macOS users:
    # mv ./GithubFileDownloader.sh /usr/local/bin/gh-dl
    # ln -s $(PWD)/GithubFileDownloader.sh /usr/local/bin/gh-dl
    # ````
    #
    # **Example Usage**
    # - You must be logged into the GitHub CLI. If not, the script will initiate the login flow.
    #
    # ```console
    #
    # $ GithubFileDownloader.sh https://github.com/github/docs/blob/main/README.md
    #
    # GithubFileDownloader.sh https://github.com/github/docs/blob/main/README.md
    # File downloaded successfully: README.md
    #
    # **NEW: Process Multiple Files**
    # - If you can get a whole bunch of `https://github.com/[user]/[repo]/blob/[branch]/[path/to/file]` URLs in, say, a text file:
    #
    # ```bash
    # gh-dl $(</path/to/file/with/URLs.txt)
    # Processing: https://github.com/[user]/[repo]/blob/[branch]/[path/to/file-1.txt]
    # File downloaded successfully: [path/to/file-1.txt]
    # Processing: https://github.com/[user]/[repo]/blob/[branch]/[path/to/file-2.txt]
    # File downloaded successfully: [path/to/file-2.txt]
    # ```
    #
    # You can also specify an output directory and the number of parallel downloads:
    #
    # ```bash
    # gh-dl --output /path/to/output/dir --parallel 4 $(</path/to/file/with/URLs.txt)
    # ```
    #
    # Currently the script is set to download 4 files in parallel. You can adjust this number by changing the `parallel_jobs` variable in the script..
    # The script will create the output directory if it doesn't exist.
    #
    # There's currently no error handling for invalid URLs, so if you have a typo in your URL list, the script will just skip it.
    # The debug log will write an awful log to stderr. You can remove the `>&2` from the `printf` statements to disable this.'
    #
    # TODO: --quiet flag to suppress output, --verbose flag to toggle the >&2 debug log.
    #
    # Additional Behaviors:
    #
    # 1. Exit Conditions:
    # - The script will exit if jq or curl are not installed
    # - The script will exit if GitHub CLI authentication fails
    # - The script will exit if no URLs are provided
    #
    # 2. Authentication:
    # - Uses web-based GitHub authentication flow if not already authenticated
    # - Requires GitHub CLI (gh) to be configured before use
    #
    # 3. URL Processing:
    # - Only accepts GitHub URLs in the format: https://github.com/[user]/[repo]/blob/[branch]/[path/to/file]
    # - Invalid URL formats will be skipped with an error message
    # - URLs are processed in parallel (default 4 concurrent downloads)
    #
    # 4. Output Handling:
    # - Debug messages are sent to stderr (>&2)
    # - Download progress is shown for each file
    # - Creates output directory if it doesn't exist
    # - Preserves original filename structure in output directory
    #
    # 5. Command Line Options:
    # --output <dir> : Specify output directory (default: current directory)
    # --parallel <num> : Specify number of parallel downloads (default: 4)
    #
    # Example with all options:
    # ./GitHubFileDownloader.sh --output ./downloads --parallel 8 https://github.com/user/repo/blob/main/file1.txt https://github.com/user/repo/blob/main/file2.txt
    #
    # Note: The script uses xargs for parallel processing, which may behave differently
    # on different Unix-like systems (BSD vs GNU)
    #
    # $ cat README.md | head -2
    # # GitHub Docs <!-- omit in toc -->
    # [![Build GitHub Docs On Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new/?repo=github)
    #
    #
    # ```
    #


    # Function to check if the user is logged into the GitHub CLI
    check_gh_cli_login() {
    printf "Checking GitHub CLI login status...\n"
    if ! gh auth status &> /dev/null; then
    printf "You are not logged into the GitHub CLI. Starting login flow...\n"
    if ! gh auth login --web; then
    @@ -57,70 +117,123 @@ check_gh_cli_login() {
    fi
    }

    # Function to convert a GitHub file URL to its corresponding GitHub API URL
    convert_url_to_api() {
    local input_url="$1"
    printf "Converting URL to API format: %s\n" "$input_url" >&2 # Debug to stderr
    local regex='https://github\.com/([^/]+)/([^/]+)/blob/([^/]+)/(.+)'

    if [[ $input_url =~ $regex ]]; then
    local user="${BASH_REMATCH[1]}"
    local repo="${BASH_REMATCH[2]}"
    local branch="${BASH_REMATCH[3]}"
    local path="${BASH_REMATCH[4]}"
    printf "https://api.github.com/repos/%s/%s/contents/%s?ref=%s\n" "$user" "$repo" "$path" "$branch"
    local generated_api_url="https://api.github.com/repos/$user/$repo/contents/$path?ref=$branch"
    printf "Generated API URL: %s\n" "$generated_api_url" >&2 # Debug to stderr
    printf "%s" "$generated_api_url" # Output clean API URL
    else
    printf "Invalid URL format.\n" >&2
    printf "Invalid URL format: %s\n" "$input_url" >&2
    return 1
    fi
    }

    # Function to download the file using GitHub API URL
    # download_file_using_api() {
    # local api_url=$(convert_url_to_api "$url")
    # ...
    download_file_using_api() {
    printf "Downloading file using API URL: %s\n" "$1"
    local api_url="$1"
    local original_file_name=$(basename "$2")

    # Ensure jq and curl are installed
    if ! command -v jq &> /dev/null || ! command -v curl &> /dev/null; then
    printf "Error: jq and curl are required.\n" >&2
    local output_dir="$3"
    local output_path="${output_dir}/${original_file_name}"

    printf "Ensuring dependencies are installed...\n"
    if ! command -v jq &> /dev/null; then
    printf "Error: 'jq' is required but not installed.\n" >&2
    return 1
    fi

    # Fetch the download URL using GitHub CLI and jq
    local download_url
    if ! download_url=$(gh api "$api_url" --jq .download_url); then
    printf "Failed to fetch download URL.\n" >&2

    if ! command -v curl &> /dev/null; then
    printf "Error: 'curl' is required but not installed.\n" >&2
    return 1
    fi

    # Download the file
    if ! curl -sL "$download_url" -o "$original_file_name"; then
    printf "Failed to download the file.\n" >&2

    printf "Fetching download URL from GitHub API...\n"
    local download_url
    printf "Fetching from GitHub API: %s\n" "$api_url" >&2 # Debugging
    if ! download_url=$(gh api "$api_url" --jq .download_url 2>&1); then
    printf "GitHub API request failed: %s\n" "$download_url" >&2
    return 1
    fi

    printf "Downloading file: %s\n" "$output_path"
    if ! curl -sL "$download_url" -o "$output_path"; then
    printf "Failed to download the file: %s\n" "$output_path" >&2
    return 1
    fi
    printf "File downloaded successfully: %s\n" "$original_file_name"

    printf "File downloaded successfully: %s\n" "$output_path"
    }



    main() {
    local url="$1"
    printf "Starting script with arguments: %s\n" "$*"
    local output_dir="."
    local urls=()
    local -i parallel_jobs=4 # Number of parallel downloads (adjust as needed)

    while [[ $# -gt 0 ]]; do
    case "$1" in
    --output)
    output_dir="$2"
    shift 2
    ;;
    --parallel)
    parallel_jobs="$2"
    shift 2
    ;;
    *)
    urls+=("$1")
    shift
    ;;
    esac
    done

    if [[ -z $url ]]; then
    printf "Usage: $0 <github file URL>\nExample: $0 https://github.com/github/docs/blob/main/README.md\n" >&2
    if [[ ${#urls[@]} -eq 0 ]]; then
    printf "Usage: $0 [--output dir] [--parallel N] <github file URLs...>\n" >&2
    return 1
    fi

    # Check if the user is logged into GitHub CLI

    printf "Ensuring output directory exists: %s\n" "$output_dir"
    mkdir -p "$output_dir"

    printf "Checking GitHub authentication...\n"
    if ! check_gh_cli_login; then
    return 1
    fi

    local api_url
    if ! api_url=$(convert_url_to_api "$url"); then
    printf "Failed to convert URL to API URL.\n" >&2
    printf "Checking dependencies...\n"
    if ! command -v jq &> /dev/null || ! command -v curl &> /dev/null; then
    printf "Error: 'jq' and 'curl' are required but not installed.\n" >&2
    return 1
    fi

    download_file_using_api "$api_url" "$url"
    export -f convert_url_to_api
    export -f download_file_using_api
    export output_dir

    printf "Processing %d files with %d parallel jobs...\n" "${#urls[@]}" "$parallel_jobs"

    # Use `xargs -n 1` to ensure only one URL is passed per execution
    printf "%s\n" "${urls[@]}" | xargs -n 1 -P "$parallel_jobs" bash -c '
    url="$0"
    printf "Processing URL: %s\n" "$url"
    api_url=$(convert_url_to_api "$url") || { printf "Failed to convert URL: %s\n" "$url" >&2; exit 1; }
    download_file_using_api "$api_url" "$url" "'"$output_dir"'"
    '

    printf "All downloads complete.\n"
    }

    main "$@"
  2. @electblake electblake revised this gist Mar 7, 2024. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion GitHubFileDownloader.sh
    Original file line number Diff line number Diff line change
    @@ -5,7 +5,9 @@
    # Version: 1.0
    # Description: This script converts a GitHub repository file URL to its GitHub API URL for file contents, checks if the user is logged into GitHub CLI, and downloads the file.
    # Source: https://gist.github.com/electblake/7ef3a63e20b3c8db67d9d66f7021d727
    # Credits: Utilized OpenAI "Bash Script" GPT by Widenex for script creation assistance.
    # Credits:
    # - Inspired by answers on: https://stackoverflow.com/questions/9159894/download-specific-files-from-github-in-command-line-not-clone-the-entire-repo
    # - Used "Bash Script" GPT by Widenex for script creation assistance.
    #
    # MIT License
    # Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted.
  3. @electblake electblake revised this gist Mar 7, 2024. 1 changed file with 3 additions and 0 deletions.
    3 changes: 3 additions & 0 deletions GitHubFileDownloader.sh
    Original file line number Diff line number Diff line change
    @@ -30,13 +30,16 @@
    #
    # **Example Usage**
    # - You must be logged into the GitHub CLI. If not, the script will initiate the login flow.
    #
    # ```console
    #
    # $ GithubFileDownloader.sh https://github.com/github/docs/blob/main/README.md
    # File downloaded successfully: README.md
    #
    # $ cat README.md | head -2
    # # GitHub Docs <!-- omit in toc -->
    # [![Build GitHub Docs On Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new/?repo=github)
    #
    # ```
    #

  4. @electblake electblake revised this gist Mar 7, 2024. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion GitHubFileDownloader.sh
    Original file line number Diff line number Diff line change
    @@ -38,6 +38,8 @@
    # # GitHub Docs <!-- omit in toc -->
    # [![Build GitHub Docs On Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new/?repo=github)
    # ```
    #


    # Function to check if the user is logged into the GitHub CLI
    check_gh_cli_login() {
    @@ -98,7 +100,7 @@ main() {
    local url="$1"

    if [[ -z $url ]]; then
    printf "Usage: $0 <github file URL>\nExample: $0 https://github.com/kepano/obsidian-minimal/blob/master/src/scss/features/checklist-icons.scss\n" >&2
    printf "Usage: $0 <github file URL>\nExample: $0 https://github.com/github/docs/blob/main/README.md\n" >&2
    return 1
    fi

  5. @electblake electblake revised this gist Mar 7, 2024. 1 changed file with 26 additions and 11 deletions.
    37 changes: 26 additions & 11 deletions GitHubFileDownloader.sh
    Original file line number Diff line number Diff line change
    @@ -11,18 +11,33 @@
    # Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted.
    # THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
    # IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

    #
    # Requires: jq, curl, and GitHub CLI (gh) installed and configured.

    # Installation:
    # 1. Place this script in a directory included in your $PATH, e.g., /usr/local/bin or ~/bin.
    # 2. Ensure it is executable: chmod +x GitHubFileDownloader.sh
    # 3. You must be logged into the GitHub CLI. If not, the script will initiate the login flow.

    # Usage:
    # ./GitHubFileDownloader.sh <github file URL>
    # Example:
    # ./GitHubFileDownloader.sh https://github.com/kepano/obsidian-minimal/blob/master/src/scss/features/checklist-icons.scss
    #
    # **Installation:**
    # 1. Download this file using the old fashioned way: Click the download icon on github.com
    # 2. Make script executable: chmod +x ./GitHubFileDownloader.sh
    # 3. Place this script in a directory included in your $PATH, e.g., /usr/local/bin (macOS El Capitan or newer) or $HOME/bin.
    #
    # *Bonus Alternative* step is to rename to command `gh-dl` and place in $PATH like:
    #
    # ```bash
    # mv ./GithubFileDownloader.sh $HOME/bin/gh-dl
    #
    # # or for macOS users:
    # mv ./GithubFileDownloader.sh /usr/local/bin/gh-dl
    # ````
    #
    # **Example Usage**
    # - You must be logged into the GitHub CLI. If not, the script will initiate the login flow.
    # ```console
    # $ GithubFileDownloader.sh https://github.com/github/docs/blob/main/README.md
    # File downloaded successfully: README.md
    #
    # $ cat README.md | head -2
    # # GitHub Docs <!-- omit in toc -->
    # [![Build GitHub Docs On Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new/?repo=github)
    # ```

    # Function to check if the user is logged into the GitHub CLI
    check_gh_cli_login() {
  6. @electblake electblake revised this gist Mar 7, 2024. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion GitHubFileDownloader.sh
    Original file line number Diff line number Diff line change
    @@ -4,7 +4,7 @@
    # Author: electblake <https://github.com/electblake>
    # Version: 1.0
    # Description: This script converts a GitHub repository file URL to its GitHub API URL for file contents, checks if the user is logged into GitHub CLI, and downloads the file.
    # Source: <URL to your gist on GitHub>
    # Source: https://gist.github.com/electblake/7ef3a63e20b3c8db67d9d66f7021d727
    # Credits: Utilized OpenAI "Bash Script" GPT by Widenex for script creation assistance.
    #
    # MIT License
  7. @electblake electblake created this gist Mar 7, 2024.
    104 changes: 104 additions & 0 deletions GitHubFileDownloader.sh
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,104 @@
    #!/bin/bash

    # Script: GitHubFileDownloader.sh
    # Author: electblake <https://github.com/electblake>
    # Version: 1.0
    # Description: This script converts a GitHub repository file URL to its GitHub API URL for file contents, checks if the user is logged into GitHub CLI, and downloads the file.
    # Source: <URL to your gist on GitHub>
    # Credits: Utilized OpenAI "Bash Script" GPT by Widenex for script creation assistance.
    #
    # MIT License
    # Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted.
    # THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
    # IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

    # Requires: jq, curl, and GitHub CLI (gh) installed and configured.

    # Installation:
    # 1. Place this script in a directory included in your $PATH, e.g., /usr/local/bin or ~/bin.
    # 2. Ensure it is executable: chmod +x GitHubFileDownloader.sh
    # 3. You must be logged into the GitHub CLI. If not, the script will initiate the login flow.

    # Usage:
    # ./GitHubFileDownloader.sh <github file URL>
    # Example:
    # ./GitHubFileDownloader.sh https://github.com/kepano/obsidian-minimal/blob/master/src/scss/features/checklist-icons.scss

    # Function to check if the user is logged into the GitHub CLI
    check_gh_cli_login() {
    if ! gh auth status &> /dev/null; then
    printf "You are not logged into the GitHub CLI. Starting login flow...\n"
    if ! gh auth login --web; then
    printf "GitHub CLI login failed.\n" >&2
    return 1
    fi
    fi
    }

    # Function to convert a GitHub file URL to its corresponding GitHub API URL
    convert_url_to_api() {
    local input_url="$1"
    local regex='https://github\.com/([^/]+)/([^/]+)/blob/([^/]+)/(.+)'

    if [[ $input_url =~ $regex ]]; then
    local user="${BASH_REMATCH[1]}"
    local repo="${BASH_REMATCH[2]}"
    local branch="${BASH_REMATCH[3]}"
    local path="${BASH_REMATCH[4]}"
    printf "https://api.github.com/repos/%s/%s/contents/%s?ref=%s\n" "$user" "$repo" "$path" "$branch"
    else
    printf "Invalid URL format.\n" >&2
    return 1
    fi
    }

    # Function to download the file using GitHub API URL
    download_file_using_api() {
    local api_url="$1"
    local original_file_name=$(basename "$2")

    # Ensure jq and curl are installed
    if ! command -v jq &> /dev/null || ! command -v curl &> /dev/null; then
    printf "Error: jq and curl are required.\n" >&2
    return 1
    fi

    # Fetch the download URL using GitHub CLI and jq
    local download_url
    if ! download_url=$(gh api "$api_url" --jq .download_url); then
    printf "Failed to fetch download URL.\n" >&2
    return 1
    fi

    # Download the file
    if ! curl -sL "$download_url" -o "$original_file_name"; then
    printf "Failed to download the file.\n" >&2
    return 1
    fi

    printf "File downloaded successfully: %s\n" "$original_file_name"
    }

    main() {
    local url="$1"

    if [[ -z $url ]]; then
    printf "Usage: $0 <github file URL>\nExample: $0 https://github.com/kepano/obsidian-minimal/blob/master/src/scss/features/checklist-icons.scss\n" >&2
    return 1
    fi

    # Check if the user is logged into GitHub CLI
    if ! check_gh_cli_login; then
    return 1
    fi

    local api_url
    if ! api_url=$(convert_url_to_api "$url"); then
    printf "Failed to convert URL to API URL.\n" >&2
    return 1
    fi

    download_file_using_api "$api_url" "$url"
    }

    main "$@"