_generate

command

v0.0.0-...-86e9f11 Latest Latest Go to latest Published: Jan 7, 2024 License: Apache-2.0 Imports: 15 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/SnellerInc/sneller

Links

Open Source Insights

Documentation ¶

Overview ¶

Alternative to-{upper,lower} approach --------------------------------------------------

Overally the design is strighforward ¶

1. We consider only characters in range 0..1ffff --- it is 17 bits. 2. We split the char code into two parts: lower 8 bits (col), and higher 9 bits (row). 3. Then we lookup in the table like: lookup[row][col]. Thus, we derference twice. 4. Lookup might store either a difference of codes (2 bytes) or pre-encoded UTF-8 char (4 bytes).

The only trick with lookup is that we compress the second-level table. Each entry of lookup[row] contains three values:

- the minimum col value - the maximum col value - offset in values table

Thus, the real lookup looks like this:

if row > maxRow {
    return no-change
}

entry := lookup[row]
if col >= entry.lo && col <= entry.hi {
    return values[col - entry.lo + entry.offset]
}

For detailed implementation please see method `LookupDiff.translate below`.

Comparison with the current approach --------------------------------------------------

The current approach stores only the difference of char codes. As a result, we have to perform: 1) UTF-8 -> rune; 2) update rune; 3) rune -> UTF-8.

This new approach allows us to omit the last step, as we can precompute UTF-8 results.

Lookup tables size comparison:

* to-lower: current = 9665, new = 12892 * to upper: current = 10356, new = 13260

The tables are ~30% bigger.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
genbytecode

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL