Created
July 25, 2016 00:53
-
-
Save kaileeagray/05584d71780c0830e071b007c64a60e5 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| require 'nokogiri' | |
| require 'open-uri' | |
| class Scraper | |
| ## I picked out a few of the pro/con lists and scraped them for their content. | |
| @@all = ["http://www.theonion.com/graphic/pros-and-cons-of-screen-time-for-kids-38445", "http://www.theonion.com/graphic/pros-and-cons-of-going-to-grad-school-38467", "http://www.theonion.com/graphic/pros-and-cons-of-the-greek-system-38515", "http://www.theonion.com/graphic/pros-and-cons-raising-minimum-wage-50476", "http://www.theonion.com/graphic/pros-and-cons-freelance-employment-51603", "http://www.theonion.com/graphic/pros-and-cons-political-correctness-52003", "http://www.theonion.com/graphic/pros-and-cons-helicopter-parenting-52130", "http://www.theonion.com/graphic/pros-and-cons-attending-college-52701", "http://www.theonion.com/graphic/pros-and-cons-profit-colleges-53065", "http://www.theonion.com/graphic/pros-and-cons-taking-gap-year-52852", "http://www.theonion.com/graphic/pros-and-cons-standardized-testing-50388", "http://www.theonion.com/graphic/the-pros-and-cons-of-open-plan-offices-38377", "http://www.theonion.com/graphic/the-pros-and-cons-of-paying-college-athletes-38321"] | |
| def visit_all | |
| @@all.each do |url| | |
| @url = url | |
| self.get_page(url) | |
| self.make_list | |
| end | |
| end | |
| def get_page(url) | |
| # The .get_page instance method will be responsible | |
| # for using Nokogiri and open-uri to grab the entire HTML document from the web page. | |
| @doc = Nokogiri::HTML(open(url)) | |
| end | |
| def make_list | |
| list = List.new | |
| list.user_id = User.find_by(username: "the_onion").id | |
| list.title = @doc.css(".content-header").text.strip | |
| list.description = @doc.css(".content-text b").first.text | |
| list.source = @url | |
| list.save | |
| @doc.css(".content-text ul").first.css("li").collect do |pro| | |
| pro = Pro.create(list_id: list.id, user_id: User.find_by(username: "the_onion").id, description: pro.text, weight: Faker::Number.between(0, 10)) | |
| pro.save! | |
| end | |
| @doc.css(".content-text ul")[1].css("li").collect do |con| | |
| con = Con.create(list_id: list.id, user_id: User.find_by(username: "the_onion").id, description: con.text, weight: Faker::Number.between(0, 10)) | |
| con.save! | |
| end | |
| list.save! | |
| end | |
| end | |
| Scraper.new.visit_all |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment