Skip to content

Instantly share code, notes, and snippets.

@stormbeta
Created December 30, 2013 21:00
Show Gist options
  • Save stormbeta/8188045 to your computer and use it in GitHub Desktop.
Save stormbeta/8188045 to your computer and use it in GitHub Desktop.

Revisions

  1. stormbeta created this gist Dec 30, 2013.
    33 changes: 33 additions & 0 deletions worm-scraper.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,33 @@
    #!/usr/bin/ruby

    # Scrapes crude HTML representation of Worm (Parahumans) web serial chapters

    require 'rubygems'
    require 'nokogiri'
    require 'open-uri'

    # URL of first chapter
    nextChapterUri = URI::encode("http://parahumans.wordpress.com/category/stories-arcs-1-10/arc-1-gestation/1-01/")

    while true do
    $stderr.puts "Opening #{nextChapterUri}"
    currentPage = Nokogiri::HTML(open(nextChapterUri))
    content = currentPage.css('div.entry-content')
    isEnd = true

    #Search for sequence link
    content.css('a').each{ |n|
    if(n.text=='Next Chapter') then
    isEnd = false
    nextChapterUri = URI::encode(n.attr('href'))
    end
    }

    chapterText = content.css('p').to_s
    puts chapterText

    if(isEnd) then
    $stderr.puts "No further chapters find, exiting..."
    exit(0)
    end
    end