Documentation ¶
Overview ¶
Package wordnet provides a WordNet parser and interface.
Basic usage ¶
The main entry point is the WordNet type. It holds all the data of a WordNet dictionary, and provides search methods.
To search for the noun meanings of 'cat':
wn, _ := wordnet.Parse(...) catNouns := wn.Search("cat")["n"] // = slice of all synsets that contain the word "cat" and are nouns.
To calculate similarity between words:
wn, _ := wordnet.Parse(...) cat := wn.Search("cat")["n"][0] dog := wn.Search("dog")["n"][0] similarity := wn.PathSimilarity(cat, dog, false) // = 0.2
To get usage examples for verbs:
wn, _ := wordnet.Parse(...) eat := wn.Search("eat")["v"][1] examples := wn.Examples(eat) // = string slice of examples for the words in the 'eat' synset.
Parts of speech ¶
Some data refers to parts of speech (POS). Everywhere a part of speech is expected, it is a single letter as follows:
a: adjective n: noun r: adverb v: verb
Citation ¶
This API is based on: Princeton University "About WordNet." WordNet. Princeton University. 2010. http://wordnet.princeton.edu
Please cite them if you use this API.
Index ¶
- Constants
- type Example
- type Frame
- type Pointer
- type Synset
- type WordNet
- func (wn *WordNet) Examples(ss *Synset) []string
- func (wn *WordNet) PathSimilarity(from, to *Synset, simulateRoot bool) float64
- func (wn *WordNet) Search(word string) map[string][]*Synset
- func (wn *WordNet) SearchRanked(word string) map[string][]*Synset
- func (wn *WordNet) String() string
- func (wn *WordNet) WupSimilarity(from, to *Synset, simulateRoot bool) float64
Constants ¶
const ( Antonym = "!" Hypernym = "@" InstanceHypernym = "@i" Hyponym = "~" InstanceHyponym = "~i" MemberHolonym = "#m" SubstanceHolonym = "#s" PartHolonym = "#p" MemberMeronym = "%m" SubstanceMeronym = "%s" PartMeronym = "%p" Attribute = "=" DerivationallyRelatedForm = "+" DomainOfSynsetTopic = ";c" MemberOfThisDomainTopic = "-c" DomainOfSynsetRegion = ";r" MemberOfThisDomainRegion = "-r" DomainOfSynsetUsage = ";u" MemberOfThisDomainUsage = "-u" Entailment = "*" Cause = ">" AlsoSee = "^" VerbGroup = "$" SimilarTo = "&" ParticipleOfVerb = "<" Pertainym = "\\" DerivedFromAdjective = "\\" )
Pointer symbol meanings.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Example ¶
type Example struct { // Index of word in the containing synset. WordNumber int `json:"wordNumber"` // Number of template in the WordNet.Example field. TemplateNumber int `json:"templateNumber"` }
An Example links a synset word to an example sentence. Applies to verbs only.
type Frame ¶
type Frame struct { // Index of word in the containing synset, -1 for entire synset. WordNumber int `json:"wordNumber"` // Frame number on the WordNet site. FrameNumber int `json:"frameNumber"` }
A Frame links a synset word to a generic phrase that illustrates how to use it. Applies to verbs only.
See the list of frames here: https://wordnet.princeton.edu/man/wninput.5WN.html#sect4
type Pointer ¶
type Pointer struct { // Relation between the 2 words. Target is <symbol> to source. See // package constants for meaning of symbols. Symbol string `json:"symbol"` // Target synset ID. Synset string `json:"synset"` // Index of word in source synset, -1 for entire synset. Source int `json:"source"` // Index of word in target synset, -1 for entire synset. Target int `json:"target"` }
A Pointer denotes a semantic relation between one synset/word to another.
See list of pointer symbols here: https://wordnet.princeton.edu/man/wninput.5WN.html#sect3
type Synset ¶
type Synset struct { // Synset offset, also used as an identifier. Offset string `json:"offset"` // Part of speech, including 's' for adjective satellite. Pos string `json:"pos"` // Words in this synset. Word []string `json:"word"` // Pointers to other synsets. Pointer []*Pointer `json:"pointer"` // Sentence frames for verbs. Frame []*Frame `json:"frame"` // Lexical definition. Gloss string `json:"gloss"` // Usage examples for words in this synset. Verbs only. Example []*Example `json:"example"` }
Synset is a set of synonymous words.
type WordNet ¶
type WordNet struct { // Maps from synset ID to synset. Synset map[string]*Synset `json:"synset"` // Maps from pos.lemma to synset IDs that contain it. Lemma map[string][]string `json:"lemma"` // Like Lemma, but synsets are ordered from the most frequently used to the // least. Only a subset of the synsets are ranked, so LemmaRanked has less // synsets. LemmaRanked map[string][]string `json:"lemmaRanked"` // Maps from exceptional word to its forms. Exception map[string][]string `json:"exception"` // Maps from example ID to sentence template. Using string keys for JSON // compatibility. Example map[string]string `json:"example"` }
WordNet is an entire wordnet database.
func Parse ¶
Parse parses an entire WordNet directory. Path is the root of the directory. The parser will trverse it and parse the required files, assuming directory structure is as published.
func (*WordNet) Examples ¶
Examples returns usage examples for the given synset. Always empty for non-verbs.
func (*WordNet) PathSimilarity ¶
PathSimilarity returns a score denoting how similar two word senses are, based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy. The score is in the range 0 to 1, where 1 means identity and 0 means completely disjoint.
If simulateRoot is true, will create a common fake root for the top of each synset's hierarchy if no common ancestor was found.
Based on NLTK's path_similarity function.
func (*WordNet) Search ¶
Search searches for a word in the dictionary. Returns a map from part of speech (a, n, r, v) to all synsets that contain that word.
func (*WordNet) SearchRanked ¶
SearchRanked searches for a word in the dictionary. Returns a map from part of speech (a, n, r, v) to synsets that contain that word, ranked from the most frequently used to the least.
Only a subset of the synsets are ranked so this may return less synsets than what Search would have.
func (*WordNet) String ¶
Returns a compact string representation of the WordNet data collection, for debugging.
func (*WordNet) WupSimilarity ¶
WupSimilarity is Wu-Palmer Similarity. Returns a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node).
If simulateRoot is true, will create a common fake root for the top of each synset's hierarchy if no common ancestor was found.
Based on NLTK's wup_similarity function.