dm

package
v2.16.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 24, 2024 License: MIT Imports: 10 Imported by: 0

Documentation

Overview

Package dm provides a way to query Unicode decomposition mappings and perform a custom compatibility decomposition using compatibility mapping tags.

This is slower than the optimised NFD and NFKD versions in text/unicode/norm, so this package is only appropriate in situations where a custom decomposition is required.

See Unicode Normalization Forms and Character Decomposition Mapping.

Index

Examples

Constants

This section is empty.

Variables

CD is a Decomposer that performs a canonical decomposition

View Source
var KD = Decomposer(0xFFFFFFFF)

KD is a Decomposer that performs a compatibility decomposition

Functions

This section is empty.

Types

type Decomposer

type Decomposer uint64

Decomposer performs a full recursive decomposition of an input then applies the canonical reordering algorithm.

func Except

func Except(types ...Type) Decomposer

Except returns a compatability decomposer, except for the compatibility mapping types given here.

func Extend

func Extend(types ...Type) Decomposer

Extend returns a canonical decomposer, extended with the compatibility mapping types given here, to create a new compatibility decomposer.

func New

func New(types ...Type) Decomposer

New returns a new Decomposer that performs a decomposition, but only for certain decomposition types.

func (Decomposer) Except

func (d Decomposer) Except(types ...Type) Decomposer

Except returns a new Decomposer that performs a decomposition on the same types as its parent, except those given here.

For example:

decompose.KD.Except(decompose.Super, decompose.Sub)

func (Decomposer) Extend

func (d Decomposer) Extend(types ...Type) Decomposer

Extend returns a new Decomposer that performs a decomposition on the same types as its parent, in addition to those given here.

For example:

decompose.CD.Extend(decompose.Super, decompose.Sub)

func (Decomposer) Map

func (d Decomposer) Map(r rune) (Type, []rune)

Map returns the decomposition type and decomposition mapping for an input rune, provided that the decomposition type is one that the Decomposer supports. If the decomposition mapping is not supported, the unsupported type is returned and the returned mappings are nil. If there are no decomposition mappings, the returned type is None and the returned mappings are nil.

Note that, while the Unicode data files also have a default mapping of a character to itself, these are not counted here (None is returned instead).

Note also that this is a single mapping, not a full decomposition. For that, call Decomposer.String method, or [Decomposer.Rune] for a single rune.

For historical Unicode reasons, the longest compatibility mapping is 18 characters long. Compatibility mappings are guaranteed to be no longer than 18 characters, although most consist of just a few characters.

func (Decomposer) String

func (d Decomposer) String(s string) (string, error)

String returns the full decomposition of s, but only applies the decomposition mappings that match the types registered with the Decomposer with New, Decomposer.Extend, or Decomposer.Except.

func (Decomposer) Transformer

func (d Decomposer) Transformer() transform.Transformer

Transformer returns an object implementing the transform.Transform interface that applies the decomposition specified by the decomposer across its input. It outputs the decomposed result.

The returned transformer is stateless, so may be used concurrently.

func (Decomposer) TransformerWithFilter

func (d Decomposer) TransformerWithFilter(filter func(x rune) bool) transform.Transformer

TransformerWithFilter is like [Transformer], however for each input rune x where filter(x) returns false, the decomposition process is skipped and that rune is output normally.

type Type

type Type int

Type is the compatibility formatting tag that controls decomposition mapping. The exact integer value is arbitrary and has no meaning.

const (
	None      Type = 0
	Canonical Type = 1  // Canonical
	Compat    Type = 2  // Otherwise unspecified compatibility character
	Encircled Type = 3  // Encircled form
	Final     Type = 4  // Final presentation form (Arabic)
	Font      Type = 5  // Font variant (for example, a blackletter form)
	Fraction  Type = 6  // Vulgar fraction form
	Initial   Type = 7  // Initial presentation form (Arabic)
	Isolated  Type = 8  // Isolated presentation form (Arabic)
	Medial    Type = 9  // Medial presentation form (Arabic)
	Narrow    Type = 10 // Narrow (or hankaku) compatibility character
	NoBreak   Type = 11 // No-break version of a space or hyphen
	Small     Type = 12 // Small variant form (CNS compatibility)
	Square    Type = 13 // CJK squared font variant
	Sub       Type = 14 // Subscript form
	Super     Type = 15 // Superscript form
	Vertical  Type = 16 // Vertical layout presentation form
	Wide      Type = 17 // Wide (or zenkaku) compatibility character
)

func Map

func Map(r rune) (Type, []rune)

Map returns the decomposition mapping type and mappings for a single input rune. If there are no decomposition mappings, the returned type is None and the returned mappings are undefined.

Note that, while the Unicode data files also have a default mapping of a character to itself, these are not counted (None is returned instead).

Note also that this is a single mapping, not a full decomposition. For that, use dm.CD.String for a full canonical decomposition, dm.KD.String for a full compatibility decomposition, or New to define a custom decomposition and call the Decomposer.String method on it.

For historical Unicode reasons, the longest compatibility mapping is 18 characters long. Compatibility mappings are guaranteed to be no longer than 18 characters, although most consist of just a few characters.

Example
package main

import (
	"fmt"

	"github.com/tawesoft/golib/v2/text/dm"
)

func main() {

	input := '²'
	dt, dm := dm.Map(input)
	fmt.Printf("%c => decomposition (%s): %s\n", input, dt, string(dm))

	if dt.IsCompat() {
		fmt.Println("This is a compatibility decomposition, not a canonical one")
	} else if dt.IsCanonical() {
		fmt.Println("This is a canonical decomposition")
	} else {
		fmt.Println("There isn't a decomposition for this input")
	}

}
Output:

² => decomposition (Super): 2
This is a compatibility decomposition, not a canonical one

func (Type) IsCanonical

func (t Type) IsCanonical() bool

IsCanonical returns true if Type is a canonical mapping.

func (Type) IsCompat

func (t Type) IsCompat() bool

IsCompat returns true if Type is any of the compatability mapping types i.e. is not Canonical, and is not None.

func (Type) String

func (t Type) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL