Documentation
¶
Overview ¶
Package alphabet provides support for describing alphabets used in different languages. This package considers an alphabet to mean a valid set of unicode runes that can be used in tasks like determining n-grams. A letter means a unicode rune and is generally not a number or a symbol (e.g @!$ etc.). Language means a set of identifiable writing letters (e.g. EN = English = Latin Alphabet).
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ErrNoLanguages = errors.New("no languages")
ErrNoLanguages is returned when no languages could be loaded.
Functions ¶
func DiscoverLetters ¶
DiscoverLetters produces a slice containing the unique non-whitespace lowercased letters found in the io.Reader.
Types ¶
type DiscoverProcessor ¶
type DiscoverProcessor struct {
// contains filtered or unexported fields
}
DiscoverProcessor is used to discover the unique non-whitespace lowercased letters found in the input sources.
func NewDiscoverProcessor ¶
func NewDiscoverProcessor() *DiscoverProcessor
NewDiscoverProcessor creates a new processor and does not report progress.
func (*DiscoverProcessor) Letters ¶
func (p *DiscoverProcessor) Letters() []rune
Letters return the discovered runes. Sounds like a tomb raider story :-D.
func (*DiscoverProcessor) ProcessFiles ¶
func (p *DiscoverProcessor) ProcessFiles(ctx context.Context, paths []string) error
ProcessFiles updates the discovered letters from the given input paths.
func (*DiscoverProcessor) Save ¶
func (p *DiscoverProcessor) Save(path string) error
Save the languages file to the given file path.
func (*DiscoverProcessor) SetProgressReporter ¶
func (p *DiscoverProcessor) SetProgressReporter(reporter processor.ProgressReporter)
SetProgressReporter sets the progress reporter to use.
type Language ¶
type Language struct { // Name of the language (e.g. Afrikaans). Name string // ISO 639 set 1 language code (e.g. af) https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes. Code LanguageCode // Letters (in UTF-8 and in lowercase) found in the language. Letters string }
Language describes the alphabet letters found in a language.
func Builtin ¶
func Builtin(code LanguageCode) (Language, error)
Builtin returns the built-in language for the given ISO 639 set 1 language.
func MustBuiltin ¶
func MustBuiltin(code LanguageCode) Language
MustBuiltin returns the built-in language for the given ISO 639 set 1 language or panics.
func (Language) ContainsRune ¶
ContainsRune returns true if the language contains the rune. The letters of the language is expected to only contain the lowercase runes that make up the alphabet and thus the specified rune is assumed to be a lowercase rune as well.
type LanguageMap ¶
type LanguageMap map[LanguageCode]Language
LanguageMap is used to map from a language code to info about the language.
func BuiltinLanguages ¶
func BuiltinLanguages() LanguageMap
BuiltinLanguages return the built-in languages.
func LoadLanguages ¶
func LoadLanguages(r io.Reader) (LanguageMap, error)
LoadLanguages parses a set of languages from an io.Reader.
Expected CSV format in UTF-8: code,name,letters Lines starting with a # is ignored.
func LoadLanguagesFromFile ¶
func LoadLanguagesFromFile(path string) (LanguageMap, error)
LoadLanguagesFromFile parses a set of languages from a UTF-8 encoded text file. See LoadLanguages for more details.
func (LanguageMap) Get ¶
func (lm LanguageMap) Get(code LanguageCode) (Language, error)
Get the language for the given code or return an error.