alphabet

package
v0.0.0-...-8e01fea Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 18, 2024 License: MIT Imports: 13 Imported by: 0

Documentation

Overview

Package alphabet provides support for describing alphabets used in different languages. This package considers an alphabet to mean a valid set of unicode runes that can be used in tasks like determining n-grams. A letter means a unicode rune and is generally not a number or a symbol (e.g @!$ etc.). Language means a set of identifiable writing letters (e.g. EN = English = Latin Alphabet).

Index

Constants

This section is empty.

Variables

View Source
var ErrNoLanguages = errors.New("no languages")

ErrNoLanguages is returned when no languages could be loaded.

Functions

func DiscoverLetters

func DiscoverLetters(ctx context.Context, input io.Reader) ([]rune, error)

DiscoverLetters produces a slice containing the unique non-whitespace lowercased letters found in the io.Reader.

func DiscoverLettersFromFile

func DiscoverLettersFromFile(ctx context.Context, path string) ([]rune, error)

DiscoverLettersFromFile produces a slice containing the unique non-whitespace lowercased letters found in the file.

Types

type DiscoverProcessor

type DiscoverProcessor struct {
	// contains filtered or unexported fields
}

DiscoverProcessor is used to discover the unique non-whitespace lowercased letters found in the input sources.

func NewDiscoverProcessor

func NewDiscoverProcessor() *DiscoverProcessor

NewDiscoverProcessor creates a new processor and does not report progress.

func (*DiscoverProcessor) Letters

func (p *DiscoverProcessor) Letters() []rune

Letters return the discovered runes. Sounds like a tomb raider story :-D.

func (*DiscoverProcessor) ProcessFiles

func (p *DiscoverProcessor) ProcessFiles(ctx context.Context, paths []string) error

ProcessFiles updates the discovered letters from the given input paths.

func (*DiscoverProcessor) Save

func (p *DiscoverProcessor) Save(path string) error

Save the languages file to the given file path.

func (*DiscoverProcessor) SetProgressReporter

func (p *DiscoverProcessor) SetProgressReporter(reporter processor.ProgressReporter)

SetProgressReporter sets the progress reporter to use.

type Language

type Language struct {
	// Name of the language (e.g. Afrikaans).
	Name string
	// ISO 639 set 1 language code (e.g. af) https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes.
	Code LanguageCode
	// Letters (in UTF-8 and in lowercase) found in the language.
	Letters string
}

Language describes the alphabet letters found in a language.

func Builtin

func Builtin(code LanguageCode) (Language, error)

Builtin returns the built-in language for the given ISO 639 set 1 language.

func MustBuiltin

func MustBuiltin(code LanguageCode) Language

MustBuiltin returns the built-in language for the given ISO 639 set 1 language or panics.

func (Language) ContainsRune

func (l Language) ContainsRune(r rune) bool

ContainsRune returns true if the language contains the rune. The letters of the language is expected to only contain the lowercase runes that make up the alphabet and thus the specified rune is assumed to be a lowercase rune as well.

type LanguageCode

type LanguageCode string

LanguageCode describes an ISO 639 set 1 language code.

type LanguageMap

type LanguageMap map[LanguageCode]Language

LanguageMap is used to map from a language code to info about the language.

func BuiltinLanguages

func BuiltinLanguages() LanguageMap

BuiltinLanguages return the built-in languages.

func LoadLanguages

func LoadLanguages(r io.Reader) (LanguageMap, error)

LoadLanguages parses a set of languages from an io.Reader.

Expected CSV format in UTF-8: code,name,letters Lines starting with a # is ignored.

func LoadLanguagesFromFile

func LoadLanguagesFromFile(path string) (LanguageMap, error)

LoadLanguagesFromFile parses a set of languages from a UTF-8 encoded text file. See LoadLanguages for more details.

func (LanguageMap) Get

func (lm LanguageMap) Get(code LanguageCode) (Language, error)

Get the language for the given code or return an error.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL