Documentation ¶
Overview ¶
Alternative to-{upper,lower} approach --------------------------------------------------
Overally the design is strighforward ¶
1. We consider only characters in range 0..1ffff --- it is 17 bits. 2. We split the char code into two parts: lower 8 bits (col), and higher 9 bits (row). 3. Then we lookup in the table like: lookup[row][col]. Thus, we derference twice. 4. Lookup might store either a difference of codes (2 bytes) or pre-encoded UTF-8 char (4 bytes).
The only trick with lookup is that we compress the second-level table. Each entry of lookup[row] contains three values:
- the minimum col value - the maximum col value - offset in values table
Thus, the real lookup looks like this:
if row > maxRow { return no-change } entry := lookup[row] if col >= entry.lo && col <= entry.hi { return values[col - entry.lo + entry.offset] }
For detailed implementation please see method `LookupDiff.translate below`.
Comparison with the current approach --------------------------------------------------
The current approach stores only the difference of char codes. As a result, we have to perform: 1) UTF-8 -> rune; 2) update rune; 3) rune -> UTF-8.
This new approach allows us to omit the last step, as we can precompute UTF-8 results.
Lookup tables size comparison:
* to-lower: current = 9665, new = 12892 * to upper: current = 10356, new = 13260
The tables are ~30% bigger.