Skip to content

Instantly share code, notes, and snippets.

View alexandreio's full-sized avatar

Alexandre alexandreio

  • Fourth Sail
  • Minas Gerais, Brazil
View GitHub Profile
@alexandreio
alexandreio / clean_str.py
Created April 22, 2021 17:24
Clean STR Master!
import re
import codecs
from unicodedata import normalize
def normalize_string(string):
text = string.encode("utf8").decode()
text = normalize("NFKD", text)
proccessed_text = text.encode("ASCII", "ignore").decode("utf8")
proccessed_text = re.sub(r"[\x00-\x1F]+", " ", proccessed_text)