unicode

package
v0.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 13, 2024 License: MIT Imports: 6 Imported by: 0

README

unicode

Files in this directory with names ending .txt are vendored copies of external data and hereby explicitly excluded from copyright claims on my part.

Files in this directory with names matching generated_*.go are generated code based upon the external data. I make no copyright claims to the values embedded in this code, as those are a programmatic transformation from the public data.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Emojiable added in v0.6.1

func Emojiable(r rune) bool

Emojiable indicates whether or not a given rune might be an emoji and so can be followed by a presentation selector, per UTS#51 on Unicode Emoji. Various characters can be followed by 0xFE0E or 0xFE0F to select text or emoji variants and override normal rendering, but this is only well defined for the characters in a Consortium-maintained table. In at least one terminal emulator, _some_ pictograms followed by the emoji presentation selector will emit garbage sequences instead of just the base pictogram.

HOWEVER: various emojis from the Emoticons block can be given the text selector and it "works" in iTerm on macOS, but this list is not complete and so simple presence-in-generated-list is not sufficient. Going on gut (so probably wrong) am using the same list of "stuff in these blocks is probably not marked as the right width" as we use elsewhere.

func FindBlockById added in v0.6.1

func FindBlockById(id BlockID) (min, max rune, name string)

FindBlockById is the same as (Block).FindByName but using the BlockID constants as a search key instead and returning just the one block name. Working from the global, now that it's static.

Types

type BlockID added in v0.6.1

type BlockID uint

BlockID is used for const identifying each block; the values are autogenerated.

const (
	BlockBasicLatin BlockID = iota
	BlockLatin1Supplement
	BlockLatinExtendedA
	BlockLatinExtendedB
	BlockIPAExtensions
	BlockSpacingModifierLetters
	BlockCombiningDiacriticalMarks
	BlockGreekandCoptic
	BlockCyrillic
	BlockCyrillicSupplement
	BlockArmenian
	BlockHebrew
	BlockArabic
	BlockSyriac
	BlockArabicSupplement
	BlockThaana
	BlockNKo
	BlockSamaritan
	BlockMandaic
	BlockSyriacSupplement
	BlockArabicExtendedB
	BlockArabicExtendedA
	BlockDevanagari
	BlockBengali
	BlockGurmukhi
	BlockGujarati
	BlockOriya
	BlockTamil
	BlockTelugu
	BlockKannada
	BlockMalayalam
	BlockSinhala
	BlockThai
	BlockLao
	BlockTibetan
	BlockMyanmar
	BlockGeorgian
	BlockHangulJamo
	BlockEthiopic
	BlockEthiopicSupplement
	BlockCherokee
	BlockUnifiedCanadianAboriginalSyllabics
	BlockOgham
	BlockRunic
	BlockTagalog
	BlockHanunoo
	BlockBuhid
	BlockTagbanwa
	BlockKhmer
	BlockMongolian
	BlockUnifiedCanadianAboriginalSyllabicsExtended
	BlockLimbu
	BlockTaiLe
	BlockNewTaiLue
	BlockKhmerSymbols
	BlockBuginese
	BlockTaiTham
	BlockCombiningDiacriticalMarksExtended
	BlockBalinese
	BlockSundanese
	BlockBatak
	BlockLepcha
	BlockOlChiki
	BlockCyrillicExtendedC
	BlockGeorgianExtended
	BlockSundaneseSupplement
	BlockVedicExtensions
	BlockPhoneticExtensions
	BlockPhoneticExtensionsSupplement
	BlockCombiningDiacriticalMarksSupplement
	BlockLatinExtendedAdditional
	BlockGreekExtended
	BlockGeneralPunctuation
	BlockSuperscriptsandSubscripts
	BlockCurrencySymbols
	BlockCombiningDiacriticalMarksforSymbols
	BlockLetterlikeSymbols
	BlockNumberForms
	BlockArrows
	BlockMathematicalOperators
	BlockMiscellaneousTechnical
	BlockControlPictures
	BlockOpticalCharacterRecognition
	BlockEnclosedAlphanumerics
	BlockBoxDrawing
	BlockBlockElements
	BlockGeometricShapes
	BlockMiscellaneousSymbols
	BlockDingbats
	BlockMiscellaneousMathematicalSymbolsA
	BlockSupplementalArrowsA
	BlockBraillePatterns
	BlockSupplementalArrowsB
	BlockMiscellaneousMathematicalSymbolsB
	BlockSupplementalMathematicalOperators
	BlockMiscellaneousSymbolsandArrows
	BlockGlagolitic
	BlockLatinExtendedC
	BlockCoptic
	BlockGeorgianSupplement
	BlockTifinagh
	BlockEthiopicExtended
	BlockCyrillicExtendedA
	BlockSupplementalPunctuation
	BlockCJKRadicalsSupplement
	BlockKangxiRadicals
	BlockIdeographicDescriptionCharacters
	BlockCJKSymbolsandPunctuation
	BlockHiragana
	BlockKatakana
	BlockBopomofo
	BlockHangulCompatibilityJamo
	BlockKanbun
	BlockBopomofoExtended
	BlockCJKStrokes
	BlockKatakanaPhoneticExtensions
	BlockEnclosedCJKLettersandMonths
	BlockCJKCompatibility
	BlockCJKUnifiedIdeographsExtensionA
	BlockYijingHexagramSymbols
	BlockCJKUnifiedIdeographs
	BlockYiSyllables
	BlockYiRadicals
	BlockLisu
	BlockVai
	BlockCyrillicExtendedB
	BlockBamum
	BlockModifierToneLetters
	BlockLatinExtendedD
	BlockSylotiNagri
	BlockCommonIndicNumberForms
	BlockPhagspa
	BlockSaurashtra
	BlockDevanagariExtended
	BlockKayahLi
	BlockRejang
	BlockHangulJamoExtendedA
	BlockJavanese
	BlockMyanmarExtendedB
	BlockCham
	BlockMyanmarExtendedA
	BlockTaiViet
	BlockMeeteiMayekExtensions
	BlockEthiopicExtendedA
	BlockLatinExtendedE
	BlockCherokeeSupplement
	BlockMeeteiMayek
	BlockHangulSyllables
	BlockHangulJamoExtendedB
	BlockHighSurrogates
	BlockHighPrivateUseSurrogates
	BlockLowSurrogates
	BlockPrivateUseArea
	BlockCJKCompatibilityIdeographs
	BlockAlphabeticPresentationForms
	BlockArabicPresentationFormsA
	BlockVariationSelectors
	BlockVerticalForms
	BlockCombiningHalfMarks
	BlockCJKCompatibilityForms
	BlockSmallFormVariants
	BlockArabicPresentationFormsB
	BlockHalfwidthandFullwidthForms
	BlockSpecials
	BlockLinearBSyllabary
	BlockLinearBIdeograms
	BlockAegeanNumbers
	BlockAncientGreekNumbers
	BlockAncientSymbols
	BlockPhaistosDisc
	BlockLycian
	BlockCarian
	BlockCopticEpactNumbers
	BlockOldItalic
	BlockGothic
	BlockOldPermic
	BlockUgaritic
	BlockOldPersian
	BlockDeseret
	BlockShavian
	BlockOsmanya
	BlockOsage
	BlockElbasan
	BlockCaucasianAlbanian
	BlockVithkuqi
	BlockLinearA
	BlockLatinExtendedF
	BlockCypriotSyllabary
	BlockImperialAramaic
	BlockPalmyrene
	BlockNabataean
	BlockHatran
	BlockPhoenician
	BlockLydian
	BlockMeroiticHieroglyphs
	BlockMeroiticCursive
	BlockKharoshthi
	BlockOldSouthArabian
	BlockOldNorthArabian
	BlockManichaean
	BlockAvestan
	BlockInscriptionalParthian
	BlockInscriptionalPahlavi
	BlockPsalterPahlavi
	BlockOldTurkic
	BlockOldHungarian
	BlockHanifiRohingya
	BlockRumiNumeralSymbols
	BlockYezidi
	BlockArabicExtendedC
	BlockOldSogdian
	BlockSogdian
	BlockOldUyghur
	BlockChorasmian
	BlockElymaic
	BlockBrahmi
	BlockKaithi
	BlockSoraSompeng
	BlockChakma
	BlockMahajani
	BlockSharada
	BlockSinhalaArchaicNumbers
	BlockKhojki
	BlockMultani
	BlockKhudawadi
	BlockGrantha
	BlockNewa
	BlockTirhuta
	BlockSiddham
	BlockModi
	BlockMongolianSupplement
	BlockTakri
	BlockAhom
	BlockDogra
	BlockWarangCiti
	BlockDivesAkuru
	BlockNandinagari
	BlockZanabazarSquare
	BlockSoyombo
	BlockUnifiedCanadianAboriginalSyllabicsExtendedA
	BlockPauCinHau
	BlockDevanagariExtendedA
	BlockBhaiksuki
	BlockMarchen
	BlockMasaramGondi
	BlockGunjalaGondi
	BlockMakasar
	BlockKawi
	BlockLisuSupplement
	BlockTamilSupplement
	BlockCuneiform
	BlockCuneiformNumbersandPunctuation
	BlockEarlyDynasticCuneiform
	BlockCyproMinoan
	BlockEgyptianHieroglyphs
	BlockEgyptianHieroglyphFormatControls
	BlockAnatolianHieroglyphs
	BlockBamumSupplement
	BlockMro
	BlockTangsa
	BlockBassaVah
	BlockPahawhHmong
	BlockMedefaidrin
	BlockMiao
	BlockIdeographicSymbolsandPunctuation
	BlockTangut
	BlockTangutComponents
	BlockKhitanSmallScript
	BlockTangutSupplement
	BlockKanaExtendedB
	BlockKanaSupplement
	BlockKanaExtendedA
	BlockSmallKanaExtension
	BlockNushu
	BlockDuployan
	BlockShorthandFormatControls
	BlockZnamennyMusicalNotation
	BlockByzantineMusicalSymbols
	BlockMusicalSymbols
	BlockAncientGreekMusicalNotation
	BlockKaktovikNumerals
	BlockMayanNumerals
	BlockTaiXuanJingSymbols
	BlockCountingRodNumerals
	BlockMathematicalAlphanumericSymbols
	BlockSuttonSignWriting
	BlockLatinExtendedG
	BlockGlagoliticSupplement
	BlockCyrillicExtendedD
	BlockNyiakengPuachueHmong
	BlockToto
	BlockWancho
	BlockNagMundari
	BlockEthiopicExtendedB
	BlockMendeKikakui
	BlockAdlam
	BlockIndicSiyaqNumbers
	BlockOttomanSiyaqNumbers
	BlockArabicMathematicalAlphabeticSymbols
	BlockMahjongTiles
	BlockDominoTiles
	BlockPlayingCards
	BlockEnclosedAlphanumericSupplement
	BlockEnclosedIdeographicSupplement
	BlockMiscellaneousSymbolsandPictographs
	BlockEmoticons
	BlockOrnamentalDingbats
	BlockTransportandMapSymbols
	BlockAlchemicalSymbols
	BlockGeometricShapesExtended
	BlockSupplementalArrowsC
	BlockSupplementalSymbolsandPictographs
	BlockChessSymbols
	BlockSymbolsandPictographsExtendedA
	BlockSymbolsforLegacyComputing
	BlockCJKUnifiedIdeographsExtensionB
	BlockCJKUnifiedIdeographsExtensionC
	BlockCJKUnifiedIdeographsExtensionD
	BlockCJKUnifiedIdeographsExtensionE
	BlockCJKUnifiedIdeographsExtensionF
	BlockCJKUnifiedIdeographsExtensionI
	BlockCJKCompatibilityIdeographsSupplement
	BlockCJKUnifiedIdeographsExtensionG
	BlockCJKUnifiedIdeographsExtensionH
	BlockTags
	BlockVariationSelectorsSupplement
	BlockSupplementaryPrivateUseAreaA
	BlockSupplementaryPrivateUseAreaB
)

type BlockInfo added in v0.6.1

type BlockInfo struct {
	Min, Max rune
	ID       BlockID
	Name     string
}

BlockInfo holds the core information to describe a range of Unicode characters which make up a Unicode Block.

type Blocks added in v0.6.1

type Blocks struct {
	// contains filtered or unexported fields
}

Blocks is our opaque container for holding data to be used for looking up block-based information.

func LoadBlocks added in v0.6.1

func LoadBlocks() Blocks

LoadBlocks returns a Blocks holder for BlockInfo lookup This is much simpler now that we generate static Golang code for the blocks.

func (Blocks) FindByName added in v0.6.1

func (b Blocks) FindByName(name string) (min, max rune, candidateNames []string)

FindByName returns the extent of the given block, with start and end runes; the block name needs to be "sufficiently unique". Returns 0,0,nil if not found. The candidateNames []string will be empty unless we hit "insufficiently unique"

func (Blocks) ListBlocks added in v0.6.1

func (b Blocks) ListBlocks() []BlockInfo

ListBlocks returns an ordered list of known blocks.

func (Blocks) Lookup added in v0.6.1

func (b Blocks) Lookup(r rune) (blockname string)

Lookup returns the name of the one block which contains a given rune, or the empty string if no such block is found.

type CharInfo

type CharInfo struct {
	Number    rune
	Name      string
	NameWidth int // occasional override
	// contains filtered or unexported fields
}

CharInfo is the basic set of information about one Unicode character. We record the codepoint (as a Go rune) and the formal Name.

func PairCharInfo added in v0.6.1

func PairCharInfo(r1, r2 rune) (CharInfo, bool)

PairCharInfo returns a faked-up CharInfo which is for rune 0 but with an informative name.

type CharInfoList added in v0.6.1

type CharInfoList []CharInfo

CharInfoList is a convenience wrapper for []CharInfo supporting sorting by Unicode code-point.

func (CharInfoList) Len added in v0.6.1

func (cil CharInfoList) Len() int

func (CharInfoList) Less added in v0.6.1

func (cil CharInfoList) Less(i, j int) bool

func (CharInfoList) Sort added in v0.6.1

func (cil CharInfoList) Sort()

Sort sorts a CharInfoList, where sorting is defined as being by Unicode codepoint.

func (CharInfoList) Swap added in v0.6.1

func (cil CharInfoList) Swap(i, j int)

type Unicode

type Unicode struct {
	ByRune  map[rune]CharInfo
	ByName  map[string]CharInfo
	Search  *ferret.InvertedSuffix
	MaxRune rune
	// contains filtered or unexported fields
}

Unicode is the set of all data about all characters which we've retrieved from formal Unicode specifications.

func Load

func Load() Unicode

Load gives us all the Unicode-spec derived data which we have.

func LoadSearch added in v0.0.3

func LoadSearch() Unicode

LoadSearch gives us all the Unicode data, with search too; the search loading is slow, so we skip it by default.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL