# How to normalize Unicode strings in Python / Go / Rust NFC, NFD, NFKC, NFKD input: ``` it’säå(1−2)ドブロク㍿ ``` result: ``` NFC : it’säå(1−2)ドブロク㍿ (45 bytes) NFD : it’säå(1−2)ドブロク㍿ (50 bytes) NFKC: it’säå(1−2)ドブロク株式会社 (41 bytes) NFKD: it’säå(1−2)ドブロク株式会社 (49 bytes) ``` ## Python ```python import unicodedata def conv_and_print(form): src = "it’säå(1−2)ドブロク㍿" norm = unicodedata.normalize(form, src) print(f"{form}: {norm} ({len(norm.encode('utf-8'))} bytes)") conv_and_print("NFC") conv_and_print("NFD") conv_and_print("NFKC") conv_and_print("NFKD") ``` ## Go You can use the [golang.org/x/text/unicode/norm](https://pkg.go.dev/golang.org/x/text/unicode/norm) package. ```go package main import ( "fmt" "golang.org/x/text/unicode/norm" ) func main() { src := "it’säå(1−2)ドブロク㍿" forms := map[string]norm.Form{"NFC": norm.NFC, "NFD": norm.NFD, "NFKC": norm.NFKC, "NFKD": norm.NFKD} for name, form := range forms { norm := form.String(src) fmt.Printf("%s: %v (%v bytes)\n", name, norm, len(norm)) } } ``` ## Rust You can use the [unicode-normalization](https://crates.io/crates/unicode-normalization) crate. ```rust fn main() { use unicode_normalization::UnicodeNormalization; let s = "it’säå(1−2)ドブロク㍿"; let print = |form, norm: &str| println!("{}: {} ({} bytes)", form, norm, norm.len()); print("NFC", &s.nfc().collect::()); print("NFD", &s.nfd().collect::()); print("NFKC", &s.nfkc().collect::()); print("NFKD", &s.nfkd().collect::()); } ```