norm

package
v0.20.0-rc2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 20, 2024 License: Apache-2.0, BSD-3-Clause Imports: 3 Imported by: 0

Documentation

Overview

Package norm contains types and functions for normalizing Unicode strings.

Index

Constants

View Source
const GraphemeJoiner = "\u034F"

GraphemeJoiner is inserted after maxNonStarters non-starter runes.

View Source
const MaxSegmentSize = maxByteBufferSize

MaxSegmentSize is the maximum size of a byte buffer needed to consider any sequence of starter and non-starter runes for the purpose of normalization.

Variables

This section is empty.

Functions

This section is empty.

Types

type Form

type Form int

A Form denotes a canonical representation of Unicode code points. The Unicode-defined normalization and equivalence forms are:

NFD   Unicode Normalization Form D
NFKD  Unicode Normalization Form KD

For a Form f, this documentation uses the notation f(x) to mean the bytes or string x converted to the given form. A position n in x is called a boundary if conversion to the form can proceed independently on both sides:

f(x) == append(f(x[0:n]), f(x[n:])...)

References: https://unicode.org/reports/tr15/ and https://unicode.org/notes/tn5/.

const (
	NFD Form = iota
	NFKD
)

func (Form) Append

func (f Form) Append(out []byte, src ...byte) []byte

Append returns f(append(out, b...)). The buffer out must be nil, empty, or equal to f(out).

func (Form) FirstBoundary

func (f Form) FirstBoundary(b []byte) int

FirstBoundary returns the position i of the first boundary in b or -1 if b contains no boundary.

func (Form) Properties

func (f Form) Properties(s []byte) Properties

Properties returns properties for the first rune in s.

type Properties

type Properties struct {
	// contains filtered or unexported fields
}

Properties provides access to normalization properties of a rune.

func (Properties) BoundaryAfter

func (p Properties) BoundaryAfter() bool

BoundaryAfter returns true if runes cannot combine with or otherwise interact with this or previous runes.

func (Properties) BoundaryBefore

func (p Properties) BoundaryBefore() bool

BoundaryBefore returns true if this rune starts a new segment and cannot combine with any rune on the left.

func (Properties) Decomposition

func (p Properties) Decomposition() []byte

Decomposition returns the decomposition for the underlying rune or nil if there is none.

func (Properties) LeadCCC

func (p Properties) LeadCCC() uint8

LeadCCC returns the CCC of the first rune in the decomposition. If there is no decomposition, LeadCCC equals CCC.

func (Properties) Size

func (p Properties) Size() int

Size returns the length of UTF-8 encoding of the rune.

func (Properties) TrailCCC

func (p Properties) TrailCCC() uint8

TrailCCC returns the CCC of the last rune in the decomposition. If there is no decomposition, TrailCCC equals CCC.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL