icudata

package
v0.20.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 27, 2024 License: Apache-2.0 Imports: 1 Imported by: 0

README

ICU data files

These are files copied from the ICU project that contain various types of data, like character properties.

How to update

Not all data files are immediately available in the source code, but need to be built first. This applies to the character / word break tables.

Copy from source data

The icu4c/source/data/in directory in the source distribution contains the following ICU data files we use:

pnames.icu
ubidi.icu
ucase.icu
unames.icu
ulayout.icu
uprops.icu
nfc.nrm
nfkc.nrm
nfkc_cf.nrm

The character and word break table need to be compiled before they can be copied.

In icu4c/source run:

./configure --with-data-packaging=files
make

This will compile the character and word break data into a binary file that we can use. Once built, the following files we use are available in icu4c/source/data/out/build/icudt<XX>l/brkitr:

char.brk
word.brk

Documentation

Index

Constants

This section is empty.

Variables

View Source
var Nfc []byte

Nfc is the table for character normalization where canonical decomposition is done followed by canonical composition. This is used for property checks of characters about composition.

View Source
var Nfkc []byte

Nfkc is the table for character normalization where compatibility decomposition is done followed by canonical composition. This is used for property checks of characters about composition.

View Source
var PNames []byte

PNames is the list of property names. It is used for example by usage of Unicode propery name aliases in regular expressions.

View Source
var UBidi []byte

UBidi is the list of bidi properties. These are used by Bidi class aliases in regular expressions.

View Source
var UCase []byte

UCase is the list of case properties. These are used for case folding internally for case insensitive matching.

View Source
var UEmoji []byte

UEmoji is the list of Emoji properties.

View Source
var ULayout []byte

ULayout is used for property checks agains the InPC, InSC and VO properties.

View Source
var UNames []byte

UNames is used for named character references in regular expressions.

View Source
var UProps []byte

UProps is used for all the character properties. These are used to retrieve properties of characters for character classes, like letters, whitespace, digits etc.

Functions

This section is empty.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL