# A theory of unicode ### Given - `CP`: a set of codepoints - `GLP`: a subset of `CP+` (non-empty sequences) With the property: (1) if `x` in `GLP` then `x+y` not in `GLP` (for all `x`,`y` in `CP+`) ### Define: `STR`: smallest subset of `CP*` such that - empty in `STR` - if `s` in `STR` and `x` in `GLP` then `x+s` in `STR` ###### Theorem: If `s` in `STR` then there exist unique `n`, and `x1, ..., xn` in `GLP` such that `s = x1 + ... + xn`. We call `n` the length of `s`. ### In-place Udate: If `s1 + x1 + ... + xn + s2` in `STR` and `|y1 + ... + ym| = |x1 + ... + xn|` then `s1 + y1 + ... + ym + s2` in `STR`. (for `s1, s2` in `S` and `x1`, ..., `xn` and `y1`, ..., `ym` in `GLP`).