Skip to content

Instantly share code, notes, and snippets.

@jamiew
Created July 13, 2011 17:46
Show Gist options
  • Save jamiew/1080846 to your computer and use it in GitHub Desktop.
Save jamiew/1080846 to your computer and use it in GitHub Desktop.

Revisions

  1. jamiew created this gist Jul 13, 2011.
    53 changes: 53 additions & 0 deletions tumblr-photo-ripper.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,53 @@
    # Usage:
    # [sudo] gem install mechanize
    # ruby tumblr-photo-ripper.rb

    require 'rubygems'
    require 'mechanize'

    # Your Tumblr subdomain, e.g. "jamiew" for "jamiew.tumblr.com"
    site = "doctorwho"


    FileUtils.mkdir_p(site)

    concurrency = 8
    num = 50
    start = 0

    loop do
    puts "start=#{start}"

    url = "http://#{site}.tumblr.com/api/read?type=photo&num=#{num}&start=#{start}"
    page = Mechanize.new.get(url)
    doc = Nokogiri::XML.parse(page.body)

    images = (doc/'post photo-url').select{|x| x if x['max-width'].to_i == 1280 }
    image_urls = images.map {|x| x.content }

    image_urls.each_slice(concurrency).each do |group|
    threads = []
    group.each do |url|
    threads << Thread.new {
    puts "Saving photo #{url}"
    begin
    file = Mechanize.new.get(url)
    filename = File.basename(file.uri.to_s.split('?')[0])
    file.save_as("#{site}/#{filename}")
    rescue Mechanize::ResponseCodeError
    puts "Error getting file, #{$!}"
    end
    }
    end
    threads.each{|t| t.join }
    end

    puts "#{images.count} images found (num=#{num})"
    if images.count < num
    puts "our work here is done"
    break
    else
    start += num
    end

    end