Skip to content

Instantly share code, notes, and snippets.

@pugson
Forked from schmich/psiupuxa.rb
Last active August 29, 2015 14:13
Show Gist options
  • Save pugson/777f11d9ec73d9c322cd to your computer and use it in GitHub Desktop.
Save pugson/777f11d9ec73d9c322cd to your computer and use it in GitHub Desktop.
require 'nokogiri'
require 'open-uri'
require 'openssl'
require 'uri'
# Uncomment next line if you don't have the Ruby SSL cert bundle installed.
# OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
pages = ['http://psiupuxa.com/index']
images = []
puts 'Scraping pages for image links.'
while page = pages.shift
puts "Scraping #{page}."
doc = Nokogiri::HTML(open(page))
doc.css('.post a[href*="desktop"]').each do |e|
href = e['href'].to_s
images << href
end
next_link = doc.css('.pages-link-active + a.pages-link').first
if next_link
next_page = next_link['href'].to_s
pages << URI.join(page, next_page)
end
end
puts "Found #{images.length} images."
puts 'Downloading images.'
count = 0
images.each do |url|
count += 1
local_file = url.split('/').last
print "[#{count}/#{images.length}] "
if File.exist? local_file
puts "Skipping #{local_file}, file exists."
next
else
puts "Downloading #{local_file}."
end
open(url, 'rb') do |image|
open(local_file, 'wb') do |file|
file.write(image.read)
end
end
end
puts 'Fin.'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment