Documentation ¶
Overview ¶
Package uax11 provides utilities for Unicode® Standard Annex #11 “East Asian Width”.
UAX 11 Introduction ¶
This annex presents the specifications of a normative property for Unicode characters that is useful when interoperating with East Asian Legacy character sets. […] When dealing with East Asian text, there is the concept of an inherent width of a character. This width takes on either of two values: narrow or wide.
[…]
For a traditional East Asian fixed pitch font, this width translates to a display width of either one half or a whole unit width. A common name for this unit width is “Em”. While an Em is customarily the height of the letter “M”, it is the same as the unit width in East Asian fonts, because in these fonts the standard character cell is square
[…]
Except for a few characters, which are explicitly called out as fullwidth or halfwidth in the Unicode Standard, characters are not duplicated based on distinction in width. Some characters, such as the ideographs, are always wide; others are always narrow; and some can be narrow or wide, depending on the context. The Unicode character property East_Asian_Width provides a default classification of characters, which an implementation can use to decide at runtime whether to treat a character as narrow or wide.
Caveats ¶
Determining the legacy fixed-width display length is not an exact science. Much depends on the properties of output devices, on fonts used, on a device's interpretation of display rules, etc. Clients should treat results of UAX#11 as heuristics. Using proportional fonts is almost always a better solution.
___________________________________________________________________________
License ¶
This project is provided under the terms of the UNLICENSE or the 3-Clause BSD license denoted by the following SPDX identifier:
SPDX-License-Identifier: 'Unlicense' OR 'BSD-3-Clause'
You may use the project under the terms of either license.
Licenses are reproduced in the license file in the root folder of this module.
Copyright © 2021 Norbert Pillmayer <norbert@pillmayer.com>
Index ¶
Constants ¶
This section is empty.
Variables ¶
var EastAsianContext = makeEastAsianContext()
EastAsianContext is a context for East Asian languages.
var LatinContext = makeLatinContext()
LatinContext is a context for western languages.
Functions ¶
func StringWidth ¶
StringWidth calculates the width of a grapheme.String in terms of `en`s, where 1en stands for 1/2em, i.e. half a full width character.
If an empty context is given, LatinContext is assumed.
s := grapheme.StringFromString("A (世). 😀") w := uax11.StringWidth(s, uax11.LatinContext) fmt.Printf("string has fixed-width display length of %d en", w) ⇒ 10
func Width ¶
Width returns the width of a grapheme, given as a byte slice, in terms of `en`s, where 1en stands for 1/2em, i.e. half a full width character. If grphm is invalid or just a zero width rune, a width of 0 is returned.
If an empty context is given, LatinContext is assumed.
Returns either 0, 1 (narrow character) or 2 (wide character).
Types ¶
type Context ¶
type Context struct { ForceEastAsian bool // force East Asian context Script language.Script // ISO 15924 script identifier Locale string // ISO 639/3166 locale string // contains filtered or unexported fields }
Context represents information about the typesetting environment.
From UAX#11: The term context as used here includes extra information such as explicit markup, knowledge of the source code page, font information, or language and script identification
Clients may fill a context paritially and hand it over to uax11. The functions in this package will try to derive a meaningful context from a partially filled one. This package relies on https://pkg.go.dev/golang.org/x/text/language/ for this to work.
context := &Context{Locale: "zh"} // unspecified Chinese _ = Width([]byte("世"), context) fmt.Printf("%v", context.Script) ⇒ “Hans” (simplified Chinese script)
Alternatively, clients may use one of the pre-defined contexts or use `ContextFromEnvironment` to get a client-machine dependent one.
func ContextFromEnvironment ¶
func ContextFromEnvironment() *Context
ContextFromEnvironment creates a Context from the operating system environment, i.e. either from environment variables on *nix sytems of from a kernel call on Windows systems. (We rely on http://github.com/cloudfoundry/jibber_jabber for this).