Documentation ¶
Overview ¶
Package norm contains types and functions for normalizing Unicode strings.
Index ¶
Constants ¶
const GraphemeJoiner = "\u034F"
GraphemeJoiner is inserted after maxNonStarters non-starter runes.
const MaxSegmentSize = maxByteBufferSize
MaxSegmentSize is the maximum size of a byte buffer needed to consider any sequence of starter and non-starter runes for the purpose of normalization.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Form ¶
type Form int
A Form denotes a canonical representation of Unicode code points. The Unicode-defined normalization and equivalence forms are:
NFD Unicode Normalization Form D NFKD Unicode Normalization Form KD
For a Form f, this documentation uses the notation f(x) to mean the bytes or string x converted to the given form. A position n in x is called a boundary if conversion to the form can proceed independently on both sides:
f(x) == append(f(x[0:n]), f(x[n:])...)
References: https://unicode.org/reports/tr15/ and https://unicode.org/notes/tn5/.
func (Form) Append ¶
Append returns f(append(out, b...)). The buffer out must be nil, empty, or equal to f(out).
func (Form) FirstBoundary ¶
FirstBoundary returns the position i of the first boundary in b or -1 if b contains no boundary.
func (Form) Properties ¶
func (f Form) Properties(s []byte) Properties
Properties returns properties for the first rune in s.
type Properties ¶
type Properties struct {
// contains filtered or unexported fields
}
Properties provides access to normalization properties of a rune.
func (Properties) BoundaryAfter ¶
func (p Properties) BoundaryAfter() bool
BoundaryAfter returns true if runes cannot combine with or otherwise interact with this or previous runes.
func (Properties) BoundaryBefore ¶
func (p Properties) BoundaryBefore() bool
BoundaryBefore returns true if this rune starts a new segment and cannot combine with any rune on the left.
func (Properties) Decomposition ¶
func (p Properties) Decomposition() []byte
Decomposition returns the decomposition for the underlying rune or nil if there is none.
func (Properties) LeadCCC ¶
func (p Properties) LeadCCC() uint8
LeadCCC returns the CCC of the first rune in the decomposition. If there is no decomposition, LeadCCC equals CCC.
func (Properties) Size ¶
func (p Properties) Size() int
Size returns the length of UTF-8 encoding of the rune.
func (Properties) TrailCCC ¶
func (p Properties) TrailCCC() uint8
TrailCCC returns the CCC of the last rune in the decomposition. If there is no decomposition, TrailCCC equals CCC.