wordnet

package
v0.1.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 13, 2021 License: MIT Imports: 10 Imported by: 9

Documentation

Overview

Package wordnet provides a WordNet parser and interface.

Basic usage

The main entry point is the WordNet type. It holds all the data of a WordNet dictionary, and provides search methods.

To search for the noun meanings of 'cat':

wn, _ := wordnet.Parse(...)
catNouns := wn.Search("cat")["n"]
// = slice of all synsets that contain the word "cat" and are nouns.

To calculate similarity between words:

wn, _ := wordnet.Parse(...)
cat := wn.Search("cat")["n"][0]
dog := wn.Search("dog")["n"][0]
similarity := wn.PathSimilarity(cat, dog, false)
// = 0.2

To get usage examples for verbs:

wn, _ := wordnet.Parse(...)
eat := wn.Search("eat")["v"][1]
examples := wn.Examples(eat)
// = string slice of examples for the words in the 'eat' synset.

Parts of speech

Some data refers to parts of speech (POS). Everywhere a part of speech is expected, it is a single letter as follows:

a: adjective
n: noun
r: adverb
v: verb

Citation

This API is based on: Princeton University "About WordNet." WordNet. Princeton University. 2010. http://wordnet.princeton.edu

Please cite them if you use this API.

Index

Constants

View Source
const (
	Antonym                   = "!"
	Hypernym                  = "@"
	InstanceHypernym          = "@i"
	Hyponym                   = "~"
	InstanceHyponym           = "~i"
	MemberHolonym             = "#m"
	SubstanceHolonym          = "#s"
	PartHolonym               = "#p"
	MemberMeronym             = "%m"
	SubstanceMeronym          = "%s"
	PartMeronym               = "%p"
	Attribute                 = "="
	DerivationallyRelatedForm = "+"
	DomainOfSynsetTopic       = ";c"
	MemberOfThisDomainTopic   = "-c"
	DomainOfSynsetRegion      = ";r"
	MemberOfThisDomainRegion  = "-r"
	DomainOfSynsetUsage       = ";u"
	MemberOfThisDomainUsage   = "-u"
	Entailment                = "*"
	Cause                     = ">"
	AlsoSee                   = "^"
	VerbGroup                 = "$"
	SimilarTo                 = "&"
	ParticipleOfVerb          = "<"
	Pertainym                 = "\\"
	DerivedFromAdjective      = "\\"
)

Pointer symbol meanings.

Variables

This section is empty.

Functions

This section is empty.

Types

type Example

type Example struct {
	// Index of word in the containing synset.
	WordNumber int `json:"wordNumber"`

	// Number of template in the WordNet.Example field.
	TemplateNumber int `json:"templateNumber"`
}

An Example links a synset word to an example sentence. Applies to verbs only.

type Frame

type Frame struct {
	// Index of word in the containing synset, -1 for entire synset.
	WordNumber int `json:"wordNumber"`

	// Frame number on the WordNet site.
	FrameNumber int `json:"frameNumber"`
}

A Frame links a synset word to a generic phrase that illustrates how to use it. Applies to verbs only.

See the list of frames here: https://wordnet.princeton.edu/man/wninput.5WN.html#sect4

type Pointer

type Pointer struct {
	// Relation between the 2 words. Target is <symbol> to source. See
	// package constants for meaning of symbols.
	Symbol string `json:"symbol"`

	// Target synset ID.
	Synset string `json:"synset"`

	// Index of word in source synset, -1 for entire synset.
	Source int `json:"source"`

	// Index of word in target synset, -1 for entire synset.
	Target int `json:"target"`
}

A Pointer denotes a semantic relation between one synset/word to another.

See list of pointer symbols here: https://wordnet.princeton.edu/man/wninput.5WN.html#sect3

type Synset

type Synset struct {
	// Synset offset, also used as an identifier.
	Offset string `json:"offset"`

	// Part of speech, including 's' for adjective satellite.
	Pos string `json:"pos"`

	// Words in this synset.
	Word []string `json:"word"`

	// Pointers to other synsets.
	Pointer []*Pointer `json:"pointer"`

	// Sentence frames for verbs.
	Frame []*Frame `json:"frame"`

	// Lexical definition.
	Gloss string `json:"gloss"`

	// Usage examples for words in this synset. Verbs only.
	Example []*Example `json:"example"`
}

Synset is a set of synonymous words.

func (*Synset) Id

func (ss *Synset) Id() string

Id returns the synset's ID, for example n123456. Equals the concatenation of POS and offset.

func (*Synset) String

func (s *Synset) String() string

Returns a string representation of the synset, for debugging.

type WordNet

type WordNet struct {
	// Maps from synset ID to synset.
	Synset map[string]*Synset `json:"synset"`

	// Maps from pos.lemma to synset IDs that contain it.
	Lemma map[string][]string `json:"lemma"`

	// Like Lemma, but synsets are ordered from the most frequently used to the
	// least. Only a subset of the synsets are ranked, so LemmaRanked has less
	// synsets.
	LemmaRanked map[string][]string `json:"lemmaRanked"`

	// Maps from exceptional word to its forms.
	Exception map[string][]string `json:"exception"`

	// Maps from example ID to sentence template. Using string keys for JSON
	// compatibility.
	Example map[string]string `json:"example"`
}

WordNet is an entire wordnet database.

func Parse

func Parse(path string) (*WordNet, error)

Parse parses an entire WordNet directory. Path is the root of the directory. The parser will trverse it and parse the required files, assuming directory structure is as published.

func (*WordNet) Examples

func (wn *WordNet) Examples(ss *Synset) []string

Examples returns usage examples for the given synset. Always empty for non-verbs.

func (*WordNet) PathSimilarity

func (wn *WordNet) PathSimilarity(from, to *Synset, simulateRoot bool) float64

PathSimilarity returns a score denoting how similar two word senses are, based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy. The score is in the range 0 to 1, where 1 means identity and 0 means completely disjoint.

If simulateRoot is true, will create a common fake root for the top of each synset's hierarchy if no common ancestor was found.

Based on NLTK's path_similarity function.

func (*WordNet) Search

func (wn *WordNet) Search(word string) map[string][]*Synset

Search searches for a word in the dictionary. Returns a map from part of speech (a, n, r, v) to all synsets that contain that word.

func (*WordNet) SearchRanked

func (wn *WordNet) SearchRanked(word string) map[string][]*Synset

SearchRanked searches for a word in the dictionary. Returns a map from part of speech (a, n, r, v) to synsets that contain that word, ranked from the most frequently used to the least.

Only a subset of the synsets are ranked so this may return less synsets than what Search would have.

func (*WordNet) String

func (wn *WordNet) String() string

Returns a compact string representation of the WordNet data collection, for debugging.

func (*WordNet) WupSimilarity

func (wn *WordNet) WupSimilarity(from, to *Synset, simulateRoot bool) float64

WupSimilarity is Wu-Palmer Similarity. Returns a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node).

If simulateRoot is true, will create a common fake root for the top of each synset's hierarchy if no common ancestor was found.

Based on NLTK's wup_similarity function.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL