Documentation ¶
Overview ¶
Package dm provides a way to query Unicode decomposition mappings and perform a custom compatibility decomposition using compatibility mapping tags.
This is slower than the optimised NFD and NFKD versions in text/unicode/norm, so this package is only appropriate in situations where a custom decomposition is required.
See Unicode Normalization Forms and Character Decomposition Mapping.
Index ¶
- Variables
- type Decomposer
- func (d Decomposer) Except(types ...Type) Decomposer
- func (d Decomposer) Extend(types ...Type) Decomposer
- func (d Decomposer) Map(r rune) (Type, []rune)
- func (d Decomposer) String(s string) (string, error)
- func (d Decomposer) Transformer() transform.Transformer
- func (d Decomposer) TransformerWithFilter(filter func(x rune) bool) transform.Transformer
- type Type
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var CD = Decomposer(1 << Canonical)
CD is a Decomposer that performs a canonical decomposition
var KD = Decomposer(0xFFFFFFFF)
KD is a Decomposer that performs a compatibility decomposition
Functions ¶
This section is empty.
Types ¶
type Decomposer ¶
type Decomposer uint64
Decomposer performs a full recursive decomposition of an input then applies the canonical reordering algorithm.
func Except ¶
func Except(types ...Type) Decomposer
Except returns a compatability decomposer, except for the compatibility mapping types given here.
func Extend ¶
func Extend(types ...Type) Decomposer
Extend returns a canonical decomposer, extended with the compatibility mapping types given here, to create a new compatibility decomposer.
func New ¶
func New(types ...Type) Decomposer
New returns a new Decomposer that performs a decomposition, but only for certain decomposition types.
func (Decomposer) Except ¶
func (d Decomposer) Except(types ...Type) Decomposer
Except returns a new Decomposer that performs a decomposition on the same types as its parent, except those given here.
For example:
decompose.KD.Except(decompose.Super, decompose.Sub)
func (Decomposer) Extend ¶
func (d Decomposer) Extend(types ...Type) Decomposer
Extend returns a new Decomposer that performs a decomposition on the same types as its parent, in addition to those given here.
For example:
decompose.CD.Extend(decompose.Super, decompose.Sub)
func (Decomposer) Map ¶
func (d Decomposer) Map(r rune) (Type, []rune)
Map returns the decomposition type and decomposition mapping for an input rune, provided that the decomposition type is one that the Decomposer supports. If the decomposition mapping is not supported, the unsupported type is returned and the returned mappings are nil. If there are no decomposition mappings, the returned type is None and the returned mappings are nil.
Note that, while the Unicode data files also have a default mapping of a character to itself, these are not counted here (None is returned instead).
Note also that this is a single mapping, not a full decomposition. For that, call Decomposer.String method, or [Decomposer.Rune] for a single rune.
For historical Unicode reasons, the longest compatibility mapping is 18 characters long. Compatibility mappings are guaranteed to be no longer than 18 characters, although most consist of just a few characters.
func (Decomposer) String ¶
func (d Decomposer) String(s string) (string, error)
String returns the full decomposition of s, but only applies the decomposition mappings that match the types registered with the Decomposer with New, Decomposer.Extend, or Decomposer.Except.
func (Decomposer) Transformer ¶
func (d Decomposer) Transformer() transform.Transformer
Transformer returns an object implementing the transform.Transform interface that applies the decomposition specified by the decomposer across its input. It outputs the decomposed result.
The returned transformer is stateless, so may be used concurrently.
func (Decomposer) TransformerWithFilter ¶
func (d Decomposer) TransformerWithFilter(filter func(x rune) bool) transform.Transformer
TransformerWithFilter is like [Transformer], however for each input rune x where filter(x) returns false, the decomposition process is skipped and that rune is output normally.
type Type ¶
type Type int
Type is the compatibility formatting tag that controls decomposition mapping. The exact integer value is arbitrary and has no meaning.
const ( None Type = 0 Canonical Type = 1 // Canonical Compat Type = 2 // Otherwise unspecified compatibility character Encircled Type = 3 // Encircled form Final Type = 4 // Final presentation form (Arabic) Font Type = 5 // Font variant (for example, a blackletter form) Fraction Type = 6 // Vulgar fraction form Initial Type = 7 // Initial presentation form (Arabic) Isolated Type = 8 // Isolated presentation form (Arabic) Medial Type = 9 // Medial presentation form (Arabic) Narrow Type = 10 // Narrow (or hankaku) compatibility character NoBreak Type = 11 // No-break version of a space or hyphen Small Type = 12 // Small variant form (CNS compatibility) Square Type = 13 // CJK squared font variant Sub Type = 14 // Subscript form Super Type = 15 // Superscript form Vertical Type = 16 // Vertical layout presentation form Wide Type = 17 // Wide (or zenkaku) compatibility character )
func Map ¶
Map returns the decomposition mapping type and mappings for a single input rune. If there are no decomposition mappings, the returned type is None and the returned mappings are undefined.
Note that, while the Unicode data files also have a default mapping of a character to itself, these are not counted (None is returned instead).
Note also that this is a single mapping, not a full decomposition. For that, use dm.CD.String for a full canonical decomposition, dm.KD.String for a full compatibility decomposition, or New to define a custom decomposition and call the Decomposer.String method on it.
For historical Unicode reasons, the longest compatibility mapping is 18 characters long. Compatibility mappings are guaranteed to be no longer than 18 characters, although most consist of just a few characters.
Example ¶
package main import ( "fmt" "github.com/tawesoft/golib/v2/text/dm" ) func main() { input := '²' dt, dm := dm.Map(input) fmt.Printf("%c => decomposition (%s): %s\n", input, dt, string(dm)) if dt.IsCompat() { fmt.Println("This is a compatibility decomposition, not a canonical one") } else if dt.IsCanonical() { fmt.Println("This is a canonical decomposition") } else { fmt.Println("There isn't a decomposition for this input") } }
Output: ² => decomposition (Super): 2 This is a compatibility decomposition, not a canonical one
func (Type) IsCanonical ¶
IsCanonical returns true if Type is a canonical mapping.