Skip to content

Instantly share code, notes, and snippets.

@Alejandro-MartinG
Created March 12, 2016 23:51
Show Gist options
  • Select an option

  • Save Alejandro-MartinG/baac209dd3b9b7fcb3aa to your computer and use it in GitHub Desktop.

Select an option

Save Alejandro-MartinG/baac209dd3b9b7fcb3aa to your computer and use it in GitHub Desktop.
Scraping web "http://it-ebooks.info/" with ruby for personal private use.
require 'tweakphoeus'
require 'nokogiri'
class Ebooks
URL = "https://it-ebooks.info"
BOOK_URL = "#{URL}/book"
N_BOOKS = 6972
def initialize
@http = Tweakphoeus::Client.new()
end
def get_urls
(1..N_BOOKS).each do |number|
begin
puts "Book number: #{number.to_s}/#{N_BOOKS}"
response = @http.get("#{BOOK_URL}/#{number}")
page = Nokogiri::HTML(response.body)
table = page.css('.justify > table:nth-child(3)')
name = table.css('tr')[10].css('a').text
url = table.css('tr')[10].css('a').attr('href').text
get_book url, name
rescue Exception => e
puts e.message
puts e.backtrace
a = File.open("errors_books.csv", "a")
a.write("ERROR IN ==> #{BOOK_URL}/#{number}")
a.close
puts "ERROR IN ==> #{BOOK_URL}/#{number}"
sleep 1
end
end
end
def get_book url, name
puts "url: #{url}, name: #{name}"
response = @http.get url
a = File.open("#{name}.pdf","w")
a.write(response.body)
a.close
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment