sse

package
v0.0.0-...-75dd7da Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 14, 2024 License: Apache-2.0 Imports: 11 Imported by: 0

README

Sango Syllabic Encoding (SSE)

Sango phonology is a rigid C?V format and can be efficiently encoded as uint16 tokens are used for simpler coding and manipulation, making it easy to:

  • Compactify notation with low entropy: suitable as vector embedding in machine learning algorithms
  • Easy to convert into and out of UTF8, validate syllables, and avoid invalid Sango phonemes
  • Iterate by symbol, letter (English/French), syllable (Sango), and word (both) without worrying about byte boundaries
  • Distinguish language and punctuation/whitespace just by inspecting the the high-order bits
  • Use interlinear code switching
  • Query different properties and mask or filter on unimportant ones
  • Record inline metadata by setting Case to Hidden
  • Isolate use of a hyphen, which in Sango is neither syntactically standardized nor semantically important

Encoding format

The 16-bit encoding divides up quasi-orthogonally into components:

Binary bit pattern Description
00UUUUUUUUUUUUUU Unicode rune
01LNNNNNAAAAAAAA ASCII letter (English or French)
1SSXXCCCCCVVVVPP Syllable (Sango)

where the bit substrings are fixed-length binary numerals when masked and shifted:

Bit Description
U Unicode rune (U+0000 - U+3FFF)
L Language (0=English, 1=French)
N min(31,n) where n = # letters remaining
A ASCII character (U+00 - U+FF)
S min(3,m) where m = # syllables remaining
X Case
C Consonant cluster
V Vowel
P Pitch

The Sango syllable encoding is defined as follows:

Case
MSB\LSB 0 1
0 lowercase Titlecase
1 hyphen-prefixed UPPERCASE
Pitch
MSB\LSB 0 1
0 Unknown (ọ) Low tone (o)
1 Mid tone (ö) High tone (ô)
Consonant cluster
MSB\LSB 00 01 10 11
000 b d f
001 g gb h k
010 kp l m mb
011 mp mv n nd
100 ng ngb ny nz
101 p r s t
110 v w y z
Vowel
MSB\LSB 00 01 10 11
00 a ə
01 ɛ e i
10 ø ɔ o
11 u ——
  • The following stand-in vowels are not found in normal Sango text and are used internally to indicate that the vowel height is unknown and is to be replaced by the appropriate open or close vowel once known:
    • əe or ɛ
    • øo or ɔ

Documentation

Index

Constants

View Source
const (
	// CaseEnum = SSEtoken >> 11 & 3
	// SSEtoken = CaseEnum & 3 << 11
	SangoCaseLower  = 0
	SangoCaseTitle  = 1
	SangoCaseHyphen = 2
	SangoCaseUpper  = 3

	// PitchEnum = SSEtoken & 3
	// SSEtoken = PitchEnum & 3
	SangoPitchUnknown = 0
	SangoPitchLow     = 1
	SangoPitchMid     = 2
	SangoPitchHigh    = 3
)

Variables

This section is empty.

Functions

This section is empty.

Types

type AsciiSSE

type AsciiSSE struct {
	SSE // invariant: t >> 14 & 3 == 1
}

func MakeAsciiSSE

func MakeAsciiSSE(r rune, isFrench bool, numLettersLeft int) AsciiSSE

type SSE

type SSE struct {
	// contains filtered or unexported fields
}

func (SSE) Bytes

func (sse SSE) Bytes() []byte

func (SSE) FullString

func (sse SSE) FullString() string

func (SSE) Glyphs

func (sse SSE) Glyphs() [][]byte

func (SSE) LanguageCode

func (sse SSE) LanguageCode() string

func (SSE) NumTokensLeftInWord

func (sse SSE) NumTokensLeftInWord() int

func (SSE) Runes

func (sse SSE) Runes() []rune

func (SSE) SSEToken

func (sse SSE) SSEToken() SSEtoken

func (SSE) String

func (sse SSE) String() string

type SSEInterface

type SSEInterface interface {
	SSEToken() SSEtoken
	Bytes() []byte
	Glyphs() [][]byte
	Runes() []rune
	FullString() string
	String() string
	LanguageCode() string
	NumTokensLeftInWord() int
}

type SSEtoken

type SSEtoken uint16

func (SSEtoken) AsAsciiSSE

func (t SSEtoken) AsAsciiSSE() AsciiSSE

func (SSEtoken) AsSSE

func (t SSEtoken) AsSSE() SSEInterface

func (SSEtoken) AsSangoSSE

func (t SSEtoken) AsSangoSSE() SangoSSE

func (SSEtoken) AsUnicodeSSE

func (t SSEtoken) AsUnicodeSSE() UnicodeSSE

type SangoSSE

type SangoSSE struct {
	SSE // invariant: t  >> 15 & 1 == 1
}

func EncodeSangoWord

func EncodeSangoWord(word string) []SangoSSE

func MakeSangoSSE

func MakeSangoSSE(syllable string, numLettersLeft int) (sangoSSE SangoSSE, wordLeft string)

func (SangoSSE) Case

func (sse SangoSSE) Case() int

func (SangoSSE) Consonants

func (sse SangoSSE) Consonants() []rune

func (SangoSSE) Pitch

func (sse SangoSSE) Pitch() int

func (SangoSSE) Vowel

func (sse SangoSSE) Vowel() []rune

type SangoSSEInterface

type SangoSSEInterface interface {
	Case() int
	Pitch() int
	Consonants() []rune
	Vowel() []rune
}

type UnicodeSSE

type UnicodeSSE struct {
	SSE // invariant: t >> 14 & 3 == 0
}

func MakeUnicodeSSE

func MakeUnicodeSSE(r rune) UnicodeSSE

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL