wordlist4096

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 7, 2023 License: MIT Imports: 2 Imported by: 4

README

Wordlist 4096

This is a robust English wordlist for encoding data. It was inspired by previous natural language encoding systems, such as What3Words, BIP-0039, and Diceware.

This wordlist was created for the Mnemonikey project.

Specification

Each word in this list represents one of 4096 equally possible options. As per Claude Shannon's theory of information, a word from this list encodes 12 bits of information, since $log_2(4096) = 12$ (i.e. $2^{12} = 4096$)

But this wordlist is not composed of any old random words. The list has been carefully constructed to fulfill the following properties:

Property Advantage
Avoids more than one homophone - words which sound the same but have different spelling. (e.g. beat and beet) Avoids ambiguity and confusion when orally communicating words from the list.
Avoids more than one conceptually competitive word (e.g. shook and shake both refer to the same action in a different tense) This prevents conceptual confusion, to allow words to be more easily memorized without mnemonic ambiguity.
Contains no words shorter than 3 characters or longer than 8 characters Long words add complexity. Small words reduce uniqueness.
Every word is uniquely identifiable by at most the first four characters. Allows fast automated interpretation when typing words as input.
The Damerau-Levenshtein distance between every word in the wordlist is at least 2. Reduces the risk of typos which could convert one valid word into another valid one by accident.

Antecedents

This project was inspired by the following natural-language encoding projects:

Contributing

The tooling for this package requires a Golang compiler.

For the moment, wordlist4096 is not finalized. I'm happy to accept PRs to improve the wordlist. Please ensure your changes fulfill these requirements:

  1. Any changes must pass tests. To run tests, use go test (or make validate).
  2. Any new words must not contravene the heuristic properties discussed in the Specification section.
  3. Any word deletions or changes must be justifiable.

Be aware that discussions about what words are 'memorable' or 'confusing' may be highly subjective. In review, I may veto any arbitrary decision between words, simply to save time.

Scripts

  • To validate the computable properties of the wordlist, run make validate.
  • To suggest new possible words to add to the wordlist, run make suggest. (This pulls from /usr/share/dict/words, only available on unix systems)
  • To sort the wordlist and deduplicate words, run make tidy.

Documentation

Index

Constants

View Source
const BitsPerWord uint = 12

BitsPerWord is the number of bits of information represented by each word in the wordlist.

Variables

View Source
var WordList []string

WordList is the mnemonic encoding wordlist in alphabetical sorted order.

View Source
var WordMap = make(map[string]uint16)

WordMap is a mapping of words to their indices in the wordlist.

Functions

This section is empty.

Types

type SearchResult

type SearchResult struct {
	// ExactMatch is true if the input query is a valid word in the wordlist.
	// Indicates that the first element of the Suffixes field will be the empty string.
	//
	// Note that finding an exact match does not necessarily mean it is the only
	// possible word. Some words are prefixes of others ("car" and "cargo").
	ExactMatch bool

	// Suffixes is a set of suffix strings which can be appended to the original
	// input query to make it a valid word in the wordlist.
	Suffixes []string
}

SearchResult is returned by the Search function. It indicates the suffixes which could complete the input query to make it a valid word in the wordlist, including the empty string if an exact match was found.

func Search(query string) *SearchResult

Search runs a binary search on the wordlist to find any words which match the given input query string. This is useful for autocomplete and error correction.

The input query must be in lower case to return any results. If the query is empty, returns a SearchResult with an empty suffixes list.

Directories

Path Synopsis
cmd
Package validate contains wordlist validation rules.
Package validate contains wordlist validation rules.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL