Skip to content

Instantly share code, notes, and snippets.

@singpolyma
Created November 20, 2015 19:52
Show Gist options
  • Save singpolyma/38d945d0e7a7dccb160d to your computer and use it in GitHub Desktop.
Save singpolyma/38d945d0e7a7dccb160d to your computer and use it in GitHub Desktop.

Revisions

  1. singpolyma created this gist Nov 20, 2015.
    24 changes: 24 additions & 0 deletions html2text.rb
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,24 @@
    require 'nokogiri'

    def content_text(nodes)
    nodes.map do |el|
    if el.text? || el.attributes['class'].to_s.match(/\b(?:h\-card|vcard|h\-x\-username)\b/) || el.attributes['rel'].to_s.match(/\btag\b/)
    el.text
    elsif el.name == 'a'
    href = el.attributes['href'].to_s
    if el.text.strip == ''
    ''
    elsif el.text.match(/[\/\.]/) && href.gsub(/[^\w\-_\/]/, '').include?(el.text.gsub(/[^\w\-_\/]/, ''))
    href
    else
    el.text + " #{href}"
    end
    elsif el.name == 'img'
    el.attributes['src'].to_s
    else
    content_text(el.children)
    end
    end.join
    end

    content_text(Nokogiri::HTML.fragment(item[:content].to_s).children)