Documentation ¶
Overview ¶
Package dictionary contains code related to looking up and storing words in dictionaries. The parser currently supports the Project Gutenberg's edition of the Webster's Unabridged 1913 dictionary.
Index ¶
- Constants
- func CreateFile(wm WordMap, dictfile string) error
- type File
- func (d *File) Close() error
- func (d *File) GetWord(word string) (*Word, bool, error)
- func (d *File) GetWords(word string) ([]*Word, bool, error)
- func (d *File) HasWord(word string) bool
- func (d *File) Lookup(word string) (*Word, bool, error)
- func (d *File) LookupWord(word string) ([]*Word, bool, error)
- func (d *File) NumWords() int
- func (d *File) Verify() error
- type Store
- type Word
- type WordMap
- func (wm WordMap) GetWord(word string) (*Word, bool, error)
- func (wm WordMap) GetWords(word string) ([]*Word, bool, error)
- func (wm WordMap) HasWord(word string) bool
- func (wm WordMap) Lookup(word string) (*Word, bool, error)
- func (wm WordMap) LookupWord(word string) ([]*Word, bool, error)
- func (wm WordMap) NumWords() int
- type WordMeaning
Constants ¶
const FileVer = "DICT6\x00" // note: can currently handle "DICT5\x00" too
FileVer is the current compatibility level of saved Files.
Variables ¶
This section is empty.
Functions ¶
func CreateFile ¶
CreateFile exports a WordMap to a file. The files specified will be overwritten if they exist.
Types ¶
type File ¶
type File struct {
// contains filtered or unexported fields
}
File implements an efficient Store which is faster to initialize and uses a lot less memory (~15 MB total) than WordMap.
There needs to be enough memory to store the whole index. Reading a dict is also completely thread-safe. Corrupt files will be detected during the read of the corrupted word (or the initialization in the case of index corruption) or during Verify.
The dict file is stored in the following format:
--------- + ------------ + --------------------------------------------- + ---------- + ------------------------------------------------- + | | | + ---- + ---------------------------- + | | | | FileVer | idx offset | | size | zlib compressed Word msgpack | ... | idx size | zlib compressed idx map[string][]offset msgpack | | | | + =================================== + | | |
--------- + ------------ + --------------------------------------------- + ============================================================== +
All sizes and offsets are little-endian int64. All sizes are the size of the size plus the data.
The file is opened using the following steps:
1. The FileVer is read and checked. It must match exactly. 2. The idx offset is read. 3. The file is seeked to the beginning plus the idx offset. 4. The idx size is read. 5. The bytes for the idx are decompressed using zlib, and the resulting msgpack is decoded into an in-memory map[string][]int64 of the words to offsets.
To read a word:
1. The offset is retrieved from the in-memory idx. 2. The file is seeked to the beginning plus the offset. 4. The size of the compressed word is read. 5. The bytes for the word are decompressed using zlib, and the resulting msgpack is decoded into an in-memory *Word.
For more details, see the source code.
It is up to the creator to ensure there aren't duplicate references to entries for headwords in the index. If duplicates are found, they will be returned as-is.
func OpenFile ¶
OpenFile opens a dictionary file. It will return errors if there are errors reading the files or critical errors in the structure.
func (*File) Close ¶
Close closes the files associated with the dictionary file and clears the in-memory index. Usage of the File afterwards may result in a panic.
func (*File) GetWords ¶ added in v1.4.0
GetWord implements Store, and will return an error if the data structure is invalid or the underlying files are inaccessible.
func (*File) LookupWord ¶ added in v1.4.0
Lookup is a shortcut for Lookup.
type Store ¶
type Store interface { // NumWords returns the number of words in the Store. NumWords() int // HasWord checks if the Store contains a word as-is (i.e. do not do any additional processing or trimming). HasWord(word string) bool // GetWords gets a word, which can have multiple instances, from the Store. // If it does not exist, exists will be false, and word and err will be nil. GetWords(word string) (w []*Word, exists bool, err error) // GetWord is deprecated. GetWord(word string) (w *Word, exists bool, err error) // LookupWord should call LookupWord on itself. LookupWord(word string) ([]*Word, bool, error) // Lookup is deprecated. Lookup(word string) (*Word, bool, error) }
Store is a backend for storing dictionary entries. Implementations should not return duplicate entries, but it is not a bug to do so.
type Word ¶
type Word struct { Word string `json:"word,omitempty" msgpack:"w"` Alternates []string `json:"alternates,omitempty" msgpack:"a"` Info string `json:"info,omitempty" msgpack:"i"` Etymology string `json:"etymology,omitempty" msgpack:"e"` Meanings []WordMeaning `json:"meanings,omitempty" msgpack:"m"` Notes []string `json:"notes,omitempty" msgpack:"n"` Extra string `json:"extra,omitempty" msgpack:"x"` Credit string `json:"credit,omitempty" msgpack:"c"` ReferencedWords []string `json:"referenced_words" msgpack:"r"` // note: this does not include words referenced within meanings }
Word represents a word.
type WordMap ¶
WordMap is an in-memory word Store used and returned by Parse. Although fast, it consumes huge amounts of memory and shouldn't be used if possible. It is up to the creator to ensure there aren't duplicate references to entries for headwords.
func Parse ¶
Parse parses Webster's Unabridged Dictionary of 1913 into a WordMap. Note: For dictserver > v1.3.1, this now uses the parser I implemented for dictutil which is much more efficient and accurate.
func (WordMap) GetWords ¶ added in v1.4.0
GetWords implements Store, but will never return an error.
func (WordMap) LookupWord ¶ added in v1.4.0
LookupWord is a shortcut for LookupWord.