Skip to content

Instantly share code, notes, and snippets.

@yalab
Last active June 28, 2019 23:10
Show Gist options
  • Save yalab/1e6fa89b01584f3b8bb918ed03bd53d5 to your computer and use it in GitHub Desktop.
Save yalab/1e6fa89b01584f3b8bb918ed03bd53d5 to your computer and use it in GitHub Desktop.

Revisions

  1. yalab revised this gist Jun 28, 2019. 1 changed file with 7 additions and 3 deletions.
    10 changes: 7 additions & 3 deletions scraping_valunerability.rb
    Original file line number Diff line number Diff line change
    @@ -7,17 +7,21 @@
    google_html = f.read
    end
    google = Nokogiri::HTML(google_html)
    REG_DOUBLE_SLASH = %r(\A//)
    REG_SLASH = %r(\A/)
    google.xpath('//a').each do |anchor|
    body = nil
    uri = if anchor[:href][0].match(%r(\A//))
    uri = if anchor[:href][0].match(REG_DOUBLE_SLASH)
    "https://google.com#{anchor[:href].gsub('//', '/')}"
    elsif anchor[:href][0].match(%r(\A/))
    elsif anchor[:href][0].match(REG_SLASH)
    "https://google.com#{anchor[:href]}"
    else
    anchor[:href]
    end
    next unless uri
    open(uri) do |f|
    body = f.read
    end
    html = Nokogiri::HTML(body)
    # ここで欲しい情報をいろいろ取ってくる処理
    end
    end
  2. yalab revised this gist Jun 28, 2019. 1 changed file with 3 additions and 2 deletions.
    5 changes: 3 additions & 2 deletions scraping_valunerability.rb
    Original file line number Diff line number Diff line change
    @@ -14,9 +14,10 @@
    elsif anchor[:href][0].match(%r(\A/))
    "https://google.com#{anchor[:href]}"
    end
    open(anchor[:href]) do |f|
    next unless uri
    open(uri) do |f|
    body = f.read
    end
    html = Nokogiri::HTML(body)
    # ここで欲しい情報をいろいろ取ってくる処理
    end
    end
  3. yalab created this gist Jun 28, 2019.
    22 changes: 22 additions & 0 deletions scraping_valunerability.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,22 @@
    require 'open-uri'
    require 'nokogiri'

    ROOT_URI = 'https://www.google.com/search?source=hp&ei=K5YWXf3qBozK8wXsgqOYDQ&q=%E8%84%86%E5%BC%B1%E6%80%A7&oq=%E8%84%86%E5%BC%B1%E6%80%A7&gs_l=psy-ab.3..0j0i131j0l6.4101.7361..7805...2.0..0.74.1039.17......0....1..gws-wiz.....0..0i4j0i131i4j0i4i70i257.JdDB9CrU7j0'
    google_html = nil
    open(ROOT_URI) do |f|
    google_html = f.read
    end
    google = Nokogiri::HTML(google_html)
    google.xpath('//a').each do |anchor|
    body = nil
    uri = if anchor[:href][0].match(%r(\A//))
    "https://google.com#{anchor[:href].gsub('//', '/')}"
    elsif anchor[:href][0].match(%r(\A/))
    "https://google.com#{anchor[:href]}"
    end
    open(anchor[:href]) do |f|
    body = f.read
    end
    html = Nokogiri::HTML(body)
    # ここで欲しい情報をいろいろ取ってくる処理
    end