Skip to content

Instantly share code, notes, and snippets.

@pecha7x
Created July 21, 2016 18:35
Show Gist options
  • Save pecha7x/0a2f664d15d0a079de32a47e643bcdee to your computer and use it in GitHub Desktop.
Save pecha7x/0a2f664d15d0a079de32a47e643bcdee to your computer and use it in GitHub Desktop.

Revisions

  1. pecha7x created this gist Jul 21, 2016.
    51 changes: 51 additions & 0 deletions nokogiri_scraping_questions.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,51 @@
    #1) What is your favorite Ruby class/library or method and why?
    I like Ruby in all. Some awesome points: Blocks/Lambdas, "Everything is an Object", Metaprogramming, etc.
    ---------------

    #2) Given HTML: "<div class="images"><img src="/pic.jpg"></div>" Using Nokogiri how would you select the src attribute from the image? Show me two different ways to do that correctly the the HTML given.
    document = Nokogiri::HTML(open("http://lalala.com/"))
    document.at_css('.images img').attr('src')
    #or
    document.xpath('//div[@class="images"]/img/@src').map {|s| s.value }
    ---------------

    # 3) If found HTML was a collection of li tags within a div with class="attr", how would you use Nokogiri to collect that information into one array?
    document.xpath('//li[@class="attr"]').map {|li| li.value }
    ---------------

    # 4) Given the following HTML: <div class="listing"> <div class="row"> <span class="left">Title:</span>
    # <span class="right">The Well-Grounded Rubyist</span> </div> <div class="row"> <span class="left">Author:</span>
    # <span class="right">David A. Black</span> </div> <div class="row"> <span class="left">Price:</span>
    # <span class="right">$34.99</span> </div> <div class="row"> <span class="left">Description:</span>
    # <span class="right">A great book for Rubyists</span> </div> <div class="row"> <span class="left">Seller:</span>
    # <span class="right">Ruby Scholar</span> </div> </div>
    # Please collect all of the data presented into a key-value store. Please include code and the output.

    document = Nokogiri::HTML('<div class="listing"> <div class="row"> <span class="left">Title:</span>
    <span class="right">The Well-Grounded Rubyist</span> </div> <div class="row">
    <span class="left">Author:</span> <span class="right">David A. Black</span> </div>
    <div class="row"> <span class="left">Price:</span> <span class="right">$34.99</span>
    </div> <div class="row"> <span class="left">Description:</span> <span class="right">A great book for Rubyists</span>
    </div> <div class="row"> <span class="left">Seller:</span> <span class="right">Ruby Scholar</span> </div> </div>')

    books = document.xpath('//div[@class="row"]/span').map do |v|
    res = v.xpath('//span').map(&:text)
    evens = res.values_at(*a.each_index.select(&:even?))
    Hash[*(evens).zip(res - evens).flatten]
    end

    p books
    => [{"Title:"=>"The Well-Grounded Rubyist", "Author:"=>"David A. Black", "Price:"=>"$34.99", "Description:"=>"A great book for Rubyists", "Seller:"=>"Ruby Scholar"},
    {"Title:"=>"The Well-Grounded Rubyist", "Author:"=>"David A. Black", "Price:"=>"$34.99", "Description:"=>"A great book for Rubyists", "Seller:"=>"Ruby Scholar"},
    {"Title:"=>"The Well-Grounded Rubyist", "Author:"=>"David A. Black", "Price:"=>"$34.99", "Description:"=>"A great book for Rubyists", "Seller:"=>"Ruby Scholar"},
    {"Title:"=>"The Well-Grounded Rubyist", "Author:"=>"David A. Black", "Price:"=>"$34.99", "Description:"=>"A great book for Rubyists", "Seller:"=>"Ruby Scholar"},
    {"Title:"=>"The Well-Grounded Rubyist", "Author:"=>"David A. Black", "Price:"=>"$34.99", "Description:"=>"A great book for Rubyists", "Seller:"=>"Ruby Scholar"},
    {"Title:"=>"The Well-Grounded Rubyist", "Author:"=>"David A. Black", "Price:"=>"$34.99", "Description:"=>"A great book for Rubyists", "Seller:"=>"Ruby Scholar"},
    {"Title:"=>"The Well-Grounded Rubyist", "Author:"=>"David A. Black", "Price:"=>"$34.99", "Description:"=>"A great book for Rubyists", "Seller:"=>"Ruby Scholar"},
    {"Title:"=>"The Well-Grounded Rubyist", "Author:"=>"David A. Black", "Price:"=>"$34.99", "Description:"=>"A great book for Rubyists", "Seller:"=>"Ruby Scholar"},
    {"Title:"=>"The Well-Grounded Rubyist", "Author:"=>"David A. Black", "Price:"=>"$34.99", "Description:"=>"A great book for Rubyists", "Seller:"=>"Ruby Scholar"},
    {"Title:"=>"The Well-Grounded Rubyist", "Author:"=>"David A. Black", "Price:"=>"$34.99", "Description:"=>"A great book for Rubyists", "Seller:"=>"Ruby Scholar"}]
    ------------------

    #5) What Ruby feature do you hate?
    It is sometimes slow...