Last active
August 4, 2023 09:36
-
-
Save pmarreck/8476538 to your computer and use it in GitHub Desktop.
Revisions
-
pmarreck revised this gist
Jan 17, 2014 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -76,7 +76,7 @@ def clean_header_values_hash(h) headers = <<-HEADERS Delivered-To: jtest01@test.com Return-Path: <[email protected]> Received: from smtp36.gate.dfw1a (gate36.gate.dfw.mlsrvr.com [172.20.100.36]) by mail138a.mail.dfw.mlsrvr.com (SMTP Server) with ESMTP id 107434FCCD -
pmarreck created this gist
Jan 17, 2014 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,137 @@ require 'ap' require 'mail' # String monkeypatch # This is one of many possible "encoding problem" solutions. It's actually an intractable problem # but you'd have to read "Gödel, Escher, Bach" to understand why... class String def clean_utf8 # self.force_encoding("UTF-8").encode("UTF-16BE", :invalid=>:replace, :replace=>"?").encode("UTF-8") unpack('C*').pack('U*') if !valid_encoding? end end module Parsing module Email class Header EMAIL_HEADER_PARSER_REGEX = / ([A-Za-z-]+):\s # Find a header key, which ends in a colon and a space. Capture the hyphenated word portion. ( # Now start capturing the value. [^\r\n]+ # First, match everything that is not a line ending char. (?: # Then start a non-capturing repeating match which first consists of... \r?\n # A line ending combo... (?![A-Za-z-]+:\s) # But first, do a negative lookahead to make sure the next line does not start with another header-key-looking string [^\r\n]+ # Then match all text that is not a line ending char. )* # Repeat this non-capturing group match 0 or more times. ) # End capturing the value. /mx # Allow matches to cross line endings (m) and allow whitespace and comments in this regex (x). EMAIL_MULTILINE_HEADER_VALUE_REGEX = /\r?\n\s*/m attr_reader :headers def initialize(opts = {}) @headers = (Hash === opts ? opts[:headers] : opts) end def call return {} unless @headers clean_header_values_hash(headers_hash) end alias to_h call private def headers_hash h = {} @headers.scan(EMAIL_HEADER_PARSER_REGEX).map do |k,v| if k && v if h[k] # if this key already has a value, wrap it in an array and append the new value, otherwise just set the key value h[k] = [h[k]] unless Array === h[k] h[k] << v else h[k] = v end end end h end # Clean up runs of a newline followed by whitespace in header values by replacing with a space def clean_header_values_hash(h) h.each do |k,v| if Array === v v.map! do |val| val =~ EMAIL_MULTILINE_HEADER_VALUE_REGEX ? val.gsub!(EMAIL_MULTILINE_HEADER_VALUE_REGEX,' ') : val end else v.gsub!(EMAIL_MULTILINE_HEADER_VALUE_REGEX, ' ') if v =~ EMAIL_MULTILINE_HEADER_VALUE_REGEX end end h end end end end headers = <<-HEADERS Delivered-To: [email protected] Return-Path: <[email protected]> Received: from smtp36.gate.dfw1a (gate36.gate.dfw.mlsrvr.com [172.20.100.36]) by mail138a.mail.dfw.mlsrvr.com (SMTP Server) with ESMTP id 107434FCCD for <[email protected]>; Wed, 29 Dec 2010 05:46:02 -0500 (EST) X-Spam-Threshold: 95 X-Spam-Score: 0 X-Spam-Flag: NO X-Virus-Scanned: OK X-MessageSniffer-Scan-Result: 0 X-MessageSniffer-Rules: 0-0-0-32767-c X-CMAE-Scan-Result: 0 X-CNFS-Analysis: v=1.0 c=1 a=410GYZBmXEQA:10 a=8nJEP1OIZ-IA:10 a=jrTSQGAaAAAA:8 a=zr8WG4425YMt8zt5NHsA:9 a=9yu0GlTtA_AyZ2ttE_-TzX-Oml0A:4 a=wPNLvfGTeEIA:10 a=vUvjSO8KtUCdM2_P:21 a=bi41oqnuo1lMp9yW:21 X-Orig-To: [email protected] X-Originating-Ip: [173.203.22.3] Received: from [173.203.22.3] ([173.203.22.3:54314] helo=287315-web1.comixology.com) by smtp36.gate.dfw1a.rsapps.net (envelope-from <[email protected]>) (ecelerity 2.2.3.46 r(37554)) with ESMTP id 4A/C8-28011-8611B1D4; Wed, 29 Dec 2010 05:46:01 -0500 Received: from comixology.com (localhost [127.0.0.1]) by 287315-web1.comixology.com (Postfix) with ESMTP id E41991188DCD; Wed, 29 Dec 2010 05:45:59 -0500 (EST) Date: Wed, 29 Dec 2010 05:45:59 -0500 To: [email protected] From: comiXology <[email protected]> Reply-to: [email protected] Subject: Feedback from the Marvel Mobile Comics App Message-ID: <[email protected]> X-Priority: 3 X-Mailer: PHPMailer 5.0.2 (phpmailer.codeworxtech.com) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="ISO-8859-1" HEADERS t = Time.now 500.times do begin email = Parsing::Email::Header.new(headers) email.call rescue ArgumentError email = Parsing::Email::Header.new(headers.clean_utf8) email.call end end puts "Time with my regex: #{Time.now - t} seconds" t = Time.now 500.times do email = Mail.new(headers) email.received end puts "Time with Mail gem: #{Time.now - t} seconds" # Time with my regex: 0.063965 seconds # Time with Mail gem: 16.327161 seconds # [Finished in 16.7s]