Skip to content

Instantly share code, notes, and snippets.

@liquidgenius
Forked from andersontep/usaddress_adapter.py
Created August 20, 2020 17:15
Show Gist options
  • Save liquidgenius/ec91b72a8dde1e9835b07ab7812a3d1d to your computer and use it in GitHub Desktop.
Save liquidgenius/ec91b72a8dde1e9835b07ab7812a3d1d to your computer and use it in GitHub Desktop.

Revisions

  1. @andersontep andersontep revised this gist Nov 11, 2015. 1 changed file with 5 additions and 3 deletions.
    8 changes: 5 additions & 3 deletions usaddress_adapter.py
    Original file line number Diff line number Diff line change
    @@ -42,13 +42,15 @@ def _combine(*label_prefixes):
    return ' '.join(components) or None

    return Address(
    # Note: If you prefer street_1 and street_2, the "Occupancy" labels are generally what falls under street_2
    # Note: If you prefer street_1 and street_2, the "Occupancy" labels
    # are generally what falls under street_2
    street=_combine('AddressNumber', 'StreetName', 'Occupancy'),
    city=_combine('PlaceName'),
    state=_combine('StateName'),
    zip_code=_combine('ZipCode'),
    country=_combine('CountryName'),
    )

    parse_address('123 Fake Street , ste 123,, San Diego, CA 12345-1234, USA')
    # >> Address(street=u'123 Fake Street ste 123', city=u'San Diego', state=u'CA', zip_code=u'12345-1234', country=u'USA')
    parse_address('123 Fake Street , ste 123,, San Diego, CA 12345-1234, USA')
    # >> Address(street=u'123 Fake Street ste 123', city=u'San Diego',
    # >> state=u'CA', zip_code=u'12345-1234', country=u'USA')
  2. @andersontep andersontep created this gist Nov 11, 2015.
    54 changes: 54 additions & 0 deletions usaddress_adapter.py
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,54 @@
    """
    * requires usaddress https://github.com/datamade/usaddress
    * check out the website! https://parserator.datamade.us/usaddress
    The usaddress package is pretty great for normalizing inconsistent address data,
    especially when if you have a lot to process and can't rely on using a geocoding api.
    The results are really granular, probably moreso than you'll need, and definitely
    more than most CRM systems if integrating these addresses is your goal.
    This is just a simple wrapper around the usaddress.tag() function that I feel will
    fit most use cases. The usaddress.parse() function works as well, but tag has some
    nice things I like such as combining certain address components,
    stripping out extra commas, etc.
    For example:
    usaddress.tag('123 Fake Street, ste 123, san diego, CA 12345, USA')
    >> (OrderedDict([('AddressNumber', u'123'),
    >> ('StreetName', u'Fake'),
    >> ('StreetNamePostType', u'Street'),
    >> ('OccupancyType', u'ste'),
    >> ('OccupancyIdentifier', u'123'),
    >> ('PlaceName', u'san diego'),
    >> ('StateName', u'CA'),
    >> ('ZipCode', u'12345'),
    >> ('CountryName', u'USA')]),
    >> 'Street Address')
    """
    import collections
    import usaddress # This is based on 0.5.4

    Address = collections.namedtuple("Address", "street, city, state, zip_code, country")

    def parse_address(address_string):
    tags, _ = usaddress.tag(address_string)

    def _combine(*label_prefixes):
    components = []
    for label, component in tags.iteritems(): # tags is an OrderedDict so this is fine
    if any(map(label.startswith, label_prefixes)):
    components.append(component)
    return ' '.join(components) or None

    return Address(
    # Note: If you prefer street_1 and street_2, the "Occupancy" labels are generally what falls under street_2
    street=_combine('AddressNumber', 'StreetName', 'Occupancy'),
    city=_combine('PlaceName'),
    state=_combine('StateName'),
    zip_code=_combine('ZipCode'),
    country=_combine('CountryName'),
    )

    parse_address('123 Fake Street , ste 123,, San Diego, CA 12345-1234, USA')
    # >> Address(street=u'123 Fake Street ste 123', city=u'San Diego', state=u'CA', zip_code=u'12345-1234', country=u'USA')