Last active
July 25, 2024 06:24
-
-
Save ciscorn/e0be0852b9ebe812b4e1787b77de397e to your computer and use it in GitHub Desktop.
Revisions
-
ciscorn revised this gist
Sep 27, 2021 . No changes.There are no files selected for viewing
-
ciscorn revised this gist
Sep 27, 2021 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,4 +1,4 @@ # Normalizing Unicode strings in Python / Go / Rust NFC, NFD, NFKC, NFKD -
ciscorn revised this gist
Sep 27, 2021 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,4 +1,4 @@ # How to normalize Unicode strings in Python / Go / Rust NFC, NFD, NFKC, NFKD -
ciscorn revised this gist
Sep 27, 2021 . No changes.There are no files selected for viewing
-
ciscorn revised this gist
Sep 27, 2021 . No changes.There are no files selected for viewing
-
ciscorn revised this gist
Sep 27, 2021 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -37,7 +37,7 @@ conv_and_print("NFKD") ## Go You can use the [golang.org/x/text/unicode/norm](https://pkg.go.dev/golang.org/x/text/unicode/norm) package. ```go package main @@ -61,7 +61,7 @@ func main() { ## Rust You can use the [unicode-normalization](https://crates.io/crates/unicode-normalization) crate. ```rust fn main() { -
ciscorn revised this gist
Sep 27, 2021 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -37,7 +37,7 @@ conv_and_print("NFKD") ## Go You can use [golang.org/x/text/unicode/norm](https://pkg.go.dev/golang.org/x/text/unicode/norm) package. ```go package main -
ciscorn revised this gist
Sep 27, 2021 . 1 changed file with 4 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -37,6 +37,8 @@ conv_and_print("NFKD") ## Go You can use golang.org/x/text/unicode/norm package. ```go package main @@ -59,6 +61,8 @@ func main() { ## Rust You can use [unicode-normalization](https://crates.io/crates/unicode-normalization) crate. ```rust fn main() { use unicode_normalization::UnicodeNormalization; -
ciscorn revised this gist
Sep 27, 2021 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -2,7 +2,7 @@ NFC, NFD, NFKC, NFKD input: ``` it’säå(1−2)ドブロク㍿ ``` -
ciscorn revised this gist
Sep 27, 2021 . 1 changed file with 7 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -2,12 +2,16 @@ NFC, NFD, NFKC, NFKD source: ``` it’säå(1−2)ドブロク㍿ ``` result: ``` NFC : it’säå(1−2)ドブロク㍿ (45 bytes) NFD : it’säå(1−2)ドブロク㍿ (50 bytes) NFKC: it’säå(1−2)ドブロク株式会社 (41 bytes) NFKD: it’säå(1−2)ドブロク株式会社 (49 bytes) ``` -
ciscorn revised this gist
Sep 27, 2021 . 1 changed file with 10 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -2,6 +2,16 @@ NFC, NFD, NFKC, NFKD ``` source: it’säå(1−2)ドブロク㍿ result: NFC: it’säå(1−2)ドブロク㍿ (45 bytes) NFD: it’säå(1−2)ドブロク㍿ (50 bytes) NFKC: it’säå(1−2)ドブロク株式会社 (41 bytes) NFKD: it’säå(1−2)ドブロク株式会社 (49 bytes) ``` ## Python ```python -
ciscorn revised this gist
Sep 27, 2021 . 1 changed file with 6 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -2,6 +2,8 @@ NFC, NFD, NFKC, NFKD ## Python ```python import unicodedata @@ -19,6 +21,8 @@ conv_and_print("NFKD") ``` ## Go ```go package main @@ -39,6 +43,8 @@ func main() { ``` ## Rust ```rust fn main() { use unicode_normalization::UnicodeNormalization; -
ciscorn created this gist
Sep 27, 2021 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,53 @@ # Unicode normalization in Python / Go / Rust NFC, NFD, NFKC, NFKD ```python import unicodedata def conv_and_print(form): src = "it’säå(1−2)ドブロク㍿" norm = unicodedata.normalize(form, src) print(f"{form}: {norm} ({len(norm.encode('utf-8'))} bytes)") conv_and_print("NFC") conv_and_print("NFD") conv_and_print("NFKC") conv_and_print("NFKD") ``` ```go package main import ( "fmt" "golang.org/x/text/unicode/norm" ) func main() { src := "it’säå(1−2)ドブロク㍿" forms := map[string]norm.Form{"NFC": norm.NFC, "NFD": norm.NFD, "NFKC": norm.NFKC, "NFKD": norm.NFKD} for name, form := range forms { norm := form.String(src) fmt.Printf("%s: %v (%v bytes)\n", name, norm, len(norm)) } } ``` ```rust fn main() { use unicode_normalization::UnicodeNormalization; let s = "it’säå(1−2)ドブロク㍿"; let print = |form, norm: &str| println!("{}: {} ({} bytes)", form, norm, norm.len()); print("NFC", &s.nfc().collect::<String>()); print("NFD", &s.nfd().collect::<String>()); print("NFKC", &s.nfkc().collect::<String>()); print("NFKD", &s.nfkd().collect::<String>()); } ```