Skip to content

Instantly share code, notes, and snippets.

@shedd
Created February 13, 2025 18:34
Show Gist options
  • Select an option

  • Save shedd/54df7cce79a059aedeee2aae5120525d to your computer and use it in GitHub Desktop.

Select an option

Save shedd/54df7cce79a059aedeee2aae5120525d to your computer and use it in GitHub Desktop.

Revisions

  1. shedd created this gist Feb 13, 2025.
    3 changes: 3 additions & 0 deletions Gemfile
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,3 @@
    source "https://rubygems.org"

    gem 'dropbox-sign'
    98 changes: 98 additions & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,98 @@
    # Dropbox Sign PDF Download Script

    Dropbox Sign (fka HelloSign) unfortunately makes it so that you must use a premium plan to export all of your completed documents to a cloud storage account.

    This Ruby script uses the [Dropbox Sign API](https://developers.hellosign.com/docs/) to:

    1. Fetch all completed Signature Requests for an account in a paginated manner (up to the max of 100 per page).
    2. Cache the list of Signature Requests locally.
    3. Download each Signature Request's PDF file in batches of 25.
    4. Cache results of successfully downloaded and failed downloads to make it easy to resume a long-running process.

    ---

    ## Requirements

    - Ruby 2.6+ (Script tested on Ruby 3.x but should work for any reasonably recent Ruby version).
    - The `dropbox-sign` gem (HelloSign's official Ruby SDK).
    - The `fileutils` and `json` standard libraries (already part of the default Ruby installation).

    ---

    ## Getting Started

    1. **Clone or download this repository** to your local machine.
    2. **Install Dependencies**:
    ```bash
    bundle install
    ```
    3. **Set Your Dropbox Sign API Key**:
    - In the script, locate the following block and replace `"API KEY"` with your actual Dropbox Sign API Key:
    ```ruby
    Dropbox::Sign.configure do |config|
    config.username = "API KEY" # <--- Replace this
    end
    ```
    - Optionally, you can manage this via environment variables (if you prefer not to hardcode the key) by modifying the code:
    ```ruby
    config.username = ENV['DROPBOX_SIGN_API_KEY']
    ```
    4. **Run the Script**:
    ```bash
    bundle exec ruby download_dropbox_sign_pdfs.rb
    ```
    - By default, it will fetch and store up to 100 requests per page, iterating through all pages.
    - It downloads PDFs into the `./files` directory.

    ---

    ## What the Script Does

    1. **Retrieve Signature Requests**
    - Calls the API via `SignatureRequestApi#signature_request_list` using pagination.
    - Only completed Signature Requests (those with at least one signed document) are fetched (due to `complete: true` in the code).
    - If a local cache (`signature_requests_cache.json`) exists, it will be used to avoid refetching everything from the API.

    2. **Download Each PDF**
    - Processes downloads in batches of 25 requests to respect rate limits.
    - Each PDF is saved with a filename pattern of `<requester_email>_<recipient_email>_<title>_<signature_request_id>.pdf` in the `./files` directory.
    - If a PDF fails to download (e.g., rate limit errors after multiple retries), it is recorded in `failed_downloads.json`.

    3. **Progress & Resuming**
    - The script logs every successful download in `completed_downloads.json`.
    - If you stop or restart the script, it will skip already completed downloads and only attempt the remaining ones.

    ---

    ## Configuration Files

    - **`signature_requests_cache.json`**
    Caches the complete list of Signature Requests.
    - If present, the script won’t call the API again for signature requests (unless you delete or rename this file).
    - This allows you to resume a failed execution.

    - **`completed_downloads.json`**
    Keeps track of Signature Request IDs that have been successfully downloaded.
    - The script won’t redownload existing entries in this file.
    - This allows you to resume a failed execution.

    - **`failed_downloads.json`**
    Keeps track of any Signature Requests that couldn’t be downloaded (after all retries).
    - This lets you investigate or retry them later.

    You can safely delete or rename these files if you want to force a full re-fetch or re-download process.

    ---

    ## Rate Limiting & Retries

    - The Dropbox Sign API has strict rate limits.
    - The script handles:
    - **429 (Too Many Requests)** responses by automatically sleeping for 20 seconds and retrying (up to 3 times).
    - A forced sleep at the end of each 25-request batch to ensure no more than 25 requests/minute are made.

    ---

    ## License

    This script is provided as-is under the MIT License.
    254 changes: 254 additions & 0 deletions download_dropbox_sign_pdfs.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,254 @@
    require 'dropbox-sign'
    require 'fileutils'
    require 'json'

    # File paths for caching and tracking progress.
    CACHE_FILE = "./signature_requests_cache.json"
    COMPLETED_FILE = "./completed_downloads.json"
    FAILED_FILE = "./failed_downloads.json"

    # Configure your HelloSign API credentials.
    Dropbox::Sign.configure do |config|
    # Configure HTTP basic authorization: api_key
    config.username = "API KEY"
    end


    # Helper method to sanitize strings for file names.
    def sanitize_filename(str)
    # Replace any character that is not alphanumeric, dot, underscore, or hyphen with an underscore.
    str.gsub(/[^0-9A-Za-z.\-_]/, '_')
    end


    # Helper method to fetch a page with retries on errors (especially 429 errors).
    def fetch_signature_request_page(signature_request_api, page_size, page_number, retries = 3)
    begin
    result = signature_request_api.signature_request_list({
    account_id: 'all',
    page_size: page_size,
    page: page_number,
    complete: true
    })
    # Debug: print the page info from the API response if available.
    if result.list_info.respond_to?(:page)
    puts "DEBUG: Received page #{result.list_info.page} of #{result.list_info.num_pages}."
    else
    puts "DEBUG: Fetched page #{page_number} (API did not return explicit page info)."
    end
    return result
    rescue StandardError => e
    if e.message.include?("429") && retries > 0
    wait_time = 20
    puts "Received 429 error on page #{page_number}. Retrying in #{wait_time} seconds..."
    sleep(wait_time)
    return fetch_signature_request_page(signature_request_api, page_size, page_number, retries - 1)
    else
    puts "Error fetching page #{page_number}: #{e.message}"
    raise e
    end
    end
    end


    # Retrieves all complete signature requests using pagination.
    # By including complete: true in the query params, only requests with a signed document are returned.
    def get_all_signature_requests
    if File.exist?(CACHE_FILE)
    puts "Loading cached signature requests from #{CACHE_FILE}..."
    data = JSON.parse(File.read(CACHE_FILE))
    puts "Loaded #{data.size} signature request(s) from cache."
    return data
    end

    signature_request_api = Dropbox::Sign::SignatureRequestApi.new
    page_size = 100
    signature_requests = []

    # Initial call to get pagination info.
    initial_result = fetch_signature_request_page(signature_request_api, page_size, 1)
    total_pages = initial_result.list_info.num_pages
    puts "Total pages to process: #{total_pages}"

    (1..total_pages).each do |page_number|
    puts "\nProcessing page #{page_number} of #{total_pages}..."
    result = fetch_signature_request_page(signature_request_api, page_size, page_number)
    current_page_count = result.signature_requests.size
    puts "DEBUG: Processing page #{page_number}, received #{current_page_count} requests."

    result.signature_requests.each do |req|
    current_req = {
    "signature_request_id" => req.signature_request_id,
    "title" => req.title,
    "requester_email_address" => req.requester_email_address,
    "recipient_email_address" => (req.respond_to?(:signatures) && req.signatures && !req.signatures.empty?) ?
    req.signatures.first.signer_email_address : "unknown"
    }

    if signature_requests.any? { |r| r["signature_request_id"] == current_req["signature_request_id"] }
    puts "DEBUG: Duplicate encountered on page #{page_number}: #{current_req["signature_request_id"]} (title: #{current_req["title"]}, requester: #{current_req["requester_email_address"]}, recipient: #{current_req["recipient_email_address"]})"
    else
    signature_requests << current_req
    end
    end

    puts " Retrieved #{current_page_count} signature request(s) from page #{page_number}."
    puts " Running total (unique complete requests so far): #{signature_requests.size}"
    end

    # Option 1: Deduplicate by signature_request_id (one file per unique signature request)
    puts "\nFinished retrieving signature requests."
    puts "Total unique complete signature requests retrieved: #{signature_requests.size}"
    File.write(CACHE_FILE, JSON.pretty_generate(signature_requests))
    puts "Cached signature requests to #{CACHE_FILE}."
    signature_requests

    # Option 2: Comment out the deduplication if you want to process every record (even duplicates)
    # puts "\nFinished retrieving signature requests."
    # puts "Total complete signature requests retrieved: #{signature_requests.size}"
    # File.write(CACHE_FILE, JSON.pretty_generate(signature_requests))
    # puts "Cached signature requests to #{CACHE_FILE}."
    # return signature_requests
    end


    # Helper function to download a file with retry logic.
    #
    # This method attempts to download the PDF for a signature request.
    # If a 429 error (Too Many Requests) is encountered, it will wait (20 seconds)
    # and retry up to max_attempts.
    #
    # After copying the file from the temporary location to the destination,
    # it explicitly closes and removes the temporary file.
    def download_with_retry(signature_request_api, req, file_path, attempt=1, max_attempts=3)
    begin
    file_bin = signature_request_api.signature_request_files(req["signature_request_id"], { file_type: "pdf" })

    # Copy the file from the temporary location to the destination.
    FileUtils.cp(file_bin.path, file_path)

    # Explicitly close and remove the temporary file.
    file_bin.close if file_bin.respond_to?(:close)
    if file_bin.respond_to?(:unlink)
    file_bin.unlink
    elsif file_bin.respond_to?(:delete)
    file_bin.delete
    end

    return true
    rescue StandardError => e
    if e.message.include?("HTTP status code: 429")
    if attempt < max_attempts
    wait_time = 20 # Reduced wait time as per your configuration.
    puts "Received 429 for request #{req["signature_request_id"]} (attempt #{attempt} of #{max_attempts}). Retrying in #{wait_time} seconds..."
    sleep(wait_time)
    return download_with_retry(signature_request_api, req, file_path, attempt + 1, max_attempts)
    else
    puts "Exceeded maximum retry attempts for request #{req["signature_request_id"]}."
    return false
    end
    else
    puts "Error downloading request #{req["signature_request_id"]}: #{e.message}"
    return false
    end
    end
    end


    # Downloads the PDF files for each signature request in batches of 25.
    # Tracks and caches completed downloads so that the process can be resumed.
    def download_requests(requests)
    signature_request_api = Dropbox::Sign::SignatureRequestApi.new
    batch_size = 25

    # Ensure the output directory exists.
    FileUtils.mkdir_p('./files')

    # Load the list of completed downloads.
    completed = if File.exist?(COMPLETED_FILE)
    JSON.parse(File.read(COMPLETED_FILE))
    else
    []
    end

    # Load the list of previous failures (so we don't add duplicates).
    failures = if File.exist?(FAILED_FILE)
    JSON.parse(File.read(FAILED_FILE))
    else
    []
    end

    # Filter out requests that have already been processed OR are missing a valid signature_request_id.
    requests_to_download = requests.reject do |req|
    id = req["signature_request_id"].to_s.strip
    id.empty? || completed.include?(id)
    end

    total_requests = requests_to_download.size
    total_batches = (total_requests / batch_size.to_f).ceil
    processed_files_count = 0

    puts "\nStarting downloads..."
    puts "Total requests to download (skipping completed ones): #{total_requests}"

    requests_to_download.each_slice(batch_size).with_index do |batch, batch_index|
    current_batch_number = batch_index + 1
    start_time = Time.now

    puts "\nProcessing batch #{current_batch_number} of #{total_batches}..."

    batch.each do |req|
    # Build a descriptive file name using the sender's email, recipient's email, and request title.
    sender_email = req["requester_email_address"] || "unknown"
    recipient_email = req["recipient_email_address"] || "unknown"
    request_title = req["title"] || "untitled"
    sanitized_sender = sanitize_filename(sender_email)
    sanitized_recipient = sanitize_filename(recipient_email)
    sanitized_title = sanitize_filename(request_title)
    file_path = "./files/#{sanitized_sender}_#{sanitized_recipient}_#{sanitized_title}_#{req["signature_request_id"]}.pdf"

    if download_with_retry(signature_request_api, req, file_path)
    processed_files_count += 1
    puts " Successfully downloaded request #{req["signature_request_id"]} to:"
    puts " #{file_path} (Total downloaded in this run: #{processed_files_count})"
    completed << req["signature_request_id"]
    File.write(COMPLETED_FILE, JSON.pretty_generate(completed))
    else
    puts " Failed to download request #{req["signature_request_id"]} after retries."

    # Check if we already have this failure recorded to avoid duplicates.
    unless failures.any? { |f| f["signature_request_id"] == req["signature_request_id"] }
    failures << {
    "signature_request_id" => req["signature_request_id"],
    "title" => req["title"],
    "requester_email_address" => req["requester_email_address"],
    "recipient_email_address" => req["recipient_email_address"]
    }
    File.write(FAILED_FILE, JSON.pretty_generate(failures))
    end
    end
    end

    files_remaining = total_requests - processed_files_count
    batches_remaining = total_batches - current_batch_number
    puts "Completed batch #{current_batch_number}/#{total_batches}: #{batch.size} file(s) processed in this batch."
    puts "Total files downloaded so far: #{processed_files_count} of #{total_requests}."
    puts "Batches remaining: #{batches_remaining} | Files remaining: #{files_remaining}"

    # Enforce the rate limit for high-tier endpoints (25 requests per minute).
    elapsed = Time.now - start_time
    sleep_time = 60 - elapsed
    if sleep_time > 0 && current_batch_number < total_batches
    puts "Sleeping for #{sleep_time.round} seconds to respect rate limit..."
    sleep(sleep_time)
    end
    end

    puts "\nDownload process completed."
    puts "Failed requests are logged in '#{FAILED_FILE}' (#{failures.size} failures recorded)."
    end

    # Main execution flow.
    # Load (or retrieve) signature requests and then download the pending ones.
    requests = get_all_signature_requests
    download_requests(requests)