Skip to content

Instantly share code, notes, and snippets.

@edison12a
Forked from bradmontgomery/kill_attrs.py
Created April 3, 2018 13:15
Show Gist options
  • Save edison12a/e974468ea384c615fba1c24c936fd19c to your computer and use it in GitHub Desktop.
Save edison12a/e974468ea384c615fba1c24c936fd19c to your computer and use it in GitHub Desktop.
A way to remove all HTML attributes with BeautifulSoup
from BeautifulSoup import BeautifulSoup
def _remove_attrs(soup):
tag_list = soup.findAll(lambda tag: len(tag.attrs) > 0)
for t in tag_list:
for attr, val in t.attrs:
del t[attr]
return soup
def example():
doc = '<html><head><title>test</title></head><body id="foo"><p class="wahtever">junk</p><div style="background: yellow;">blah</div></body></html>'
print 'Before:\n%s' % doc
soup = BeautifulSoup(doc)
clean_soup = _remove_attrs(soup)
print 'After:\n%s' % clean_soup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment