diacritics

package

v0.6.0 Latest Latest Go to latest Published: Apr 13, 2020 License: LGPL-3.0 Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/abhabongse/fuzzymatch-go

Links

Open Source Insights

Documentation ¶

Overview ¶

Package diacritics is the subpackage of package candidate which will attempt to remove diacritical marks from extended latin letters based on one of two different strategies.

- Strategy #1: Straight diacritics removal (NFKD -> strip Mn -> NFKC) - Strategy #2: Apache Lucene ASCII folding

Index ¶

Variables

Constants ¶

This section is empty.

Variables ¶

View Source

var AsciiFoldTransformer = transform.Chain(
	norm.NFKC,
	&asciiFoldSpanningTransformer{},
)

AsciiFoldTransformer is a Unicode stream transformer object which replaces a character with the ASCII folding version of the character.

View Source

var AsciiFoldTranslateTable = map[rune]string{}/* 1240 elements not displayed */

ASCII folding database fetched from https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.java

View Source

var StripDiacriticalMarksTransformer = transform.Chain(
	norm.NFKD,

	runes.Remove(runes.In(runedata.CombiningDiacriticalMarks)),
	norm.NFKC,
)

StripDiacriticalMarksTransformer is a Unicode stream transformer object which tries to remove as many combining diacritical marks from the input string as possible. It handles various combinations of the same Unicode characters whenever possible (such as 'ö' as a single codepoint vs. 'o' + '¨' = 'ö' which has 2 codepoints).

The removal process is preceded by Unicode decomposition, and the result is then re-combined to get final output.

Functions ¶

This section is empty.

Types ¶

This section is empty.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL