Created
December 30, 2013 21:00
-
-
Save stormbeta/8188045 to your computer and use it in GitHub Desktop.
Revisions
-
stormbeta created this gist
Dec 30, 2013 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,33 @@ #!/usr/bin/ruby # Scrapes crude HTML representation of Worm (Parahumans) web serial chapters require 'rubygems' require 'nokogiri' require 'open-uri' # URL of first chapter nextChapterUri = URI::encode("http://parahumans.wordpress.com/category/stories-arcs-1-10/arc-1-gestation/1-01/") while true do $stderr.puts "Opening #{nextChapterUri}" currentPage = Nokogiri::HTML(open(nextChapterUri)) content = currentPage.css('div.entry-content') isEnd = true #Search for sequence link content.css('a').each{ |n| if(n.text=='Next Chapter') then isEnd = false nextChapterUri = URI::encode(n.attr('href')) end } chapterText = content.css('p').to_s puts chapterText if(isEnd) then $stderr.puts "No further chapters find, exiting..." exit(0) end end