golem

package module
v0.0.0-...-b8519ae Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 26, 2020 License: MIT Imports: 6 Imported by: 0

README

GoLem

This project is a dictionary based lemmatizer written in go. Requires git lfs for large dictionary files.

What?

A lemmatizer is a tool that finds the base form of words.

Lang Input Output
English aligning align
Swedish sprungit springa
French abattaient abattre

It's based on the dictionaries found on michmech/lemmatization-lists, which are available under the Open Database License. This project would not be feasible without them.

Languages

At the moment golem supports English, Swedish, French, Spanish, Italian & German, but adding another language should be no more trouble than getting the dictionary for that language. Some of which are already available on lexiconista. Please let me know if there is something you would like to see in here, or fork the project and create a pull request.

Basic usage
package main

import (
	"github.com/aaaton/golem"
	"github.com/aaaton/golem/dicts/en"
)

func main() {
	// the language packages are available under golem/dicts
	// "en" is for english
	lemmatizer, err := golem.New(en.New())
	if err != nil {
		panic(err)
	}
	word := lemmatizer.Lemma("Abducting")
	if word != "abduct" {
		panic("The output is not what is expected!")
	}
}

To regenerate the files, run make all. This requires go-bindata to be installed.

Contributors
  • axamon
  • charlesgiroux
  • glaslos

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Lemmatizer

type Lemmatizer struct {
	// contains filtered or unexported fields
}

Lemmatizer is the key to lemmatizing a word in a language

func New

func New(pack dicts.LanguagePack) (*Lemmatizer, error)

New produces a new Lemmatizer

func (*Lemmatizer) InDict

func (l *Lemmatizer) InDict(word string) bool

InDict checks if a certain word is in the dictionary

func (*Lemmatizer) Lemma

func (l *Lemmatizer) Lemma(word string) string

Lemma gets one of the base forms of a word

func (*Lemmatizer) LemmaLower

func (l *Lemmatizer) LemmaLower(word string) string

LemmaLower gets one of the base forms of a lower case word

func (*Lemmatizer) Lemmas

func (l *Lemmatizer) Lemmas(word string) (out []string)

Lemmas gets all the base forms of a word

Directories

Path Synopsis
de
Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT
Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT
en
Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT
Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT
es
Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT
Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT
fr
Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT
Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT
it
Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT
Code generated by golem/dicts/cmd/generate_pack.go DO NOT EDIT

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL