misspell

package

v0.0.0-...-0437ee0 Latest Latest Go to latest Published: May 5, 2020 License: Apache-2.0, MIT Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/SaadMTSA/goreporter

README ¶

Correct commonly misspelled English words... quickly.

install with `go get -u github.com/client9/misspell/cmd/misspell`

$ misspell all.html your.txt important.md files.go
your.txt:42:10 found "langauge" a misspelling of "language"

# ^ file, line, column

You'll need golang 1.5 or newer installed to compile it. But after that it's a standalone binary.

If people want pre-compiled binaries, file a ticket please.

FAQ

Automatic Corrections
Converting UK spellings to US
Using pipes and stdin
Golang special support
gometalinter support
CSV Output
Using SQLite3
Changing output format
Checking a folder recursively
Performance
Known Issues
Debugging
False Negatives and missing words
Origin of Word Lists
Software License
Problem statement
Other spelling correctors
Other ideas

How can I make the corrections automatically?

Just add the -w flag!

$ misspell -w all.html your.txt important.md files.go
your.txt:9:21:corrected "langauge" to "language"

# ^booyah

How do I convert British spellings to American (or vice-versa)?

Add the -locale US flag!

$ misspell -locale US important.txt
important.txt:10:20 found "colour" a misspelling of "color"

Add the -locale UK flag!

$ echo "My favorite color is blue" | misspell -locale UK
stdin:1:3:found "favorite color" a misspelling of "favourite colour"

Help is appreciated as I'm neither British nor an expert in the English language.

How do you check an entire folder recursively?

Just list a directory you'd like to check

misspell .
misspell aDirectory anotherDirectory aFile

You can also run misspell recursively using the following shell tricks:

misspell directory/**/*

or

find . -type f | xargs misspell

Can I use pipes or `stdin` for input?

Yes!

Print messages to stderr only:

$ echo "zeebra" | misspell
stdin:1:0:found "zeebra" a misspelling of "zebra"

Print messages to stderr, and corrected text to stdout:

$ echo "zeebra" | misspell -w
stdin:1:0:corrected "zeebra" to "zebra"
zebra

Only print the corrected text to stdout:

$ echo "zeebra" | misspell -w -q
zebra

Are there special rules for golang source files?

Yes! If the file ends in .go, then misspell will only check spelling in comments.

If you want to force a file to be checked as a golang source, use -source=go on the command line. Conversely, you can check a golang source as if it were pure text by using -source=text. You might want to do this since many variable names have misspellings in them!

Can I check only-comments in other other programming languages?

I'm told the using -source=go works well for ruby, javascript, java, c and c++.

It doesn't work well for python and bash.

Does this work with gometalinter?

gometalinter runs multiple golang linters. Starting on 2016-06-12 gometalinter supports misspell natively but it is disabled by default.

# update your copy of gometalinter
go get -u github.com/alecthomas/gometalinter

# install updates and misspell
gometalinter --install --update

To use, just enable misspell

gometalinter --enable misspell ./...

Note that gometalinter only checks golang files, and uses the default options of misspell

You may wish to run this on your plaintext (.txt) and/or markdown files too.

<a name="csv"

How Can I Get CSV Output?

Using -f csv, the output is standard comma-seprated values with headers in the first row.

misspell -f csv *
file,line,column,typo,corrected
"README.md",9,22,langauge,language
"README.md",47,25,langauge,language

<a name="sqlite"

How can I export to SQLite3?

Using -f sqlite, the output is a sqlite3 dump-file.

$ misspell -f sqlite * > /tmp/misspell.sql
$ cat /tmp/misspell.sql

PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE misspell(
  "file" TEXT,
  "line" INTEGER,i
  "column" INTEGER,i
  "typo" TEXT,
  "corrected" TEXT
);
INSERT INTO misspell VALUES("install.txt",202,31,"immediatly","immediately");
# etc...
COMMIT;

$ sqlite3 -init /tmp/misspell.sql :memory: 'select count(*) from misspell'
1

With some tricks you can directly pipe output to sqlite3 by using -init /dev/stdin:

misspell -f sqlite * | sqlite3 -init /dev/stdin -column -cmd '.width 60 15' ':memory' \
    'select substr(file,35),typo,count(*) as count from misspell group by file, typo order by count desc;'

How can I change the output format?

Using the -f template flag you can pass in a golang text template to format the output.

One can use printf "%q" VALUE to safely quote a value.

The default template is compatible with gometalinter

{{ .Filename }}:{{ .Line }}:{{ .Column }}:corrected {{ printf "%q" .Original }} to "{{ printf "%q" .Corrected }}"

To just print probable misspellings:

-f '{{ .Original }}'

What problem does this solve?

This corrects commonly misspelled English words in computer source code, and other text-based formats (.txt, .md, etc).

It is designed to run quickly so it can be used as a pre-commit hook with minimal burden on the developer.

It does not work with binary formats (e.g. Word, etc).

It is not a complete spell-checking program nor a grammar checker.

What are other misspelling correctors and what's wrong with them?

Some other misspelling correctors:

They all work but had problems that prevented me from using them at scale:

slow, all of the above check one misspelling at a time (i.e. linear) using regexps
not MIT/Apache2 licensed (or equivalent)
have dependencies that don't work for me (python3, bash, linux sed, etc)
don't understand American vs. British English and sometimes makes unwelcome "corrections"

That said, they might be perfect for you and many have more features than this project!

How fast is it?

Misspell is Easily 100x to 1000x faster than other spelling correctors. You should be able to check and correct 1000 files in under 250ms.

This uses the mighty power of golang's strings.Replacer which is a implementation or variation of the Aho–Corasick algorithm. This makes multiple substring matches simultaneously

In addition this uses multiple CPU cores to work on multiple files.

What problems does it have?

Unlike the other projects, this doesn't know what a "word" is. There may be more false positives and false negatives due to this. On the other hand, it sometimes catches things others don't.

Either way, please file bugs and we'll fix them!

Since it operates in parallel to make corrections, it can be non-obvious to determine exactly what word was corrected.

It's making mistakes. How can I debug?

Run using -debug flag on the file you want. It should then print what word it is trying to correct. Then file a bug describing the problem. Thanks!

Why is it making mistakes or missing items in golang files?

The matching function is case-sensitive, so variable names that are multiple worlds either in all-upper or all-lower case sometimes can cause false positives. For instance a variable named bodyreader could trigger a false positive since yrea is in the middle that could be corrected to year. Other problems happen if the variable name uses a English contraction that should use an apostrophe. The best way of fixing this is to use the Effective Go naming conventions and use camelCase for variable names. You can check your code using golint

What license is this?

MIT

Where do the word lists come from?

It started with a word list from Wikipedia. Unfortunately, this list had to be highly edited as many of the words are obsolete or based from mistakes on mechanical typewriters (I'm guessing).

Additional words were added based on actually mistakes seen in the wild (meaning self-generated).

Variations of UK and US spellings are based on many sources including:

http://www.tysto.com/uk-us-spelling-list.html (with heavy editing, many are incorrect)
http://www.oxforddictionaries.com/us/words/american-and-british-spelling-american (excellent site but incomplete)
Diffing US and UK scowl dictionaries

American English is more accepting of spelling variations than is British English, so "what is American or not" is subject to opinion. Corrections and help welcome.

### What are some other enhancements that could be done?

Here's some ideas for enhancements:

Capitalization of proper nouns could be done (e.g. weekday and month names, country names, language names)

Opinionated US spellings US English has a number of words with alternate spellings. Think adviser vs. advisor. While "advisor" is not wrong, the opinionated US locale would correct "advisor" to "adviser".

Versioning Some type of versioning is needed so reporting mistakes and errors is easier.

Feedback Mistakes would be sent to some server for agregation and feedback review.

Github Emoji Test 👿

👿 😻

Bold 👿:

This is an :imp:

This is an :imp:

Documentation ¶

Index ¶

Variables
func CaseVariations(word string, style WordCase) []string
func ReadTextFile(filename string) (string, error)
func RemoveEmail(s string) string
func RemoveHost(s string) string
func RemoveNotWords(s string) string
func RemovePath(s string) string
func StripURL(s string) string
type Diff
type Replacer
- func New() *Replacer
type WordCase
- func CaseStyle(word string) WordCase

Constants ¶

This section is empty.

Variables ¶

View Source

var DictAmerican = []string{}/* 9720 elements not displayed */

DictAmerican converts UK spellings to US spellings

View Source

var DictBritish = []string{}/* 8870 elements not displayed */

DictBritish converts US spellings to UK spellings

View Source

var DictMain = []string{}/* 168044 elements not displayed */

DictMain is the main rule set, not including locale-specific spellings

Functions ¶

func CaseVariations ¶

func CaseVariations(word string, style WordCase) []string

CaseVariations returns If AllUpper or First-Letter-Only is upcased: add the all upper case version If AllLower, add the original, the title and upcase forms If Mixed, return the original, and the all upcase form

func ReadTextFile ¶

func ReadTextFile(filename string) (string, error)

ReadTextFile returns the contents of a file, first testing if it is a text file

returns ("", nil) if not a text file
returns ("", error) if error
returns (string, nil) if text

unfortunately, in worse case, this does

 1 stat
 1 open,read,close of 512 bytes
 1 more stat,open, read everything, close (via ioutil.ReadAll)
This could be kinder to the filesystem.

This uses some heuristics of the file's extension (e.g. .zip, .txt) and uses a sniffer to determine if the file is text or not. Using file extensions isn't great, but probably good enough for real-world use. Golang's built in sniffer is problematic for differnet reasons. It's optimized for HTML, and is very limited in detection. It would be good to explicitly add some tests for ELF/DWARF formats to make sure we never corrupt binary files.

func RemoveEmail ¶

func RemoveEmail(s string) string

RemoveEmail remove email-like strings, e.g. "nickg+junk@xfoobar.com", "nickg@xyz.abc123.biz"

func RemoveHost ¶

func RemoveHost(s string) string

RemoveHost removes host-like strings "foobar.com" "abc123.fo1231.biz"

func RemoveNotWords ¶

func RemoveNotWords(s string) string

RemoveNotWords blanks out all the not words

func RemovePath ¶

func RemovePath(s string) string

RemovePath attempts to strip away embedded file system paths, e.g.

/foo/bar or /static/myimg.png

TODO: windows style

func StripURL ¶

func StripURL(s string) string

StripURL attemps to replace URLs with blank spaces, e.g.

"xxx http://foo.com/ yyy -> "xxx          yyyy"

Types ¶

type Diff ¶

type Diff struct {
	Filename  string
	FullLine  string
	Line      int
	Column    int
	Original  string
	Corrected string
}

Diff is datastructure showing what changed in a single line

type Replacer ¶

type Replacer struct {
	Replacements []string
	Debug        bool
	// contains filtered or unexported fields
}

Replacer is the main struct for spelling correction

func New ¶

func New() *Replacer

New creates a new default Replacer using the main rule list

func (*Replacer) AddRuleList ¶

func (r *Replacer) AddRuleList(additions []string)

AddRuleList appends new rules. Input is in the same form as Strings.Replacer: [ old1, new1, old2, new2, ....] Note: does not check for duplictes

func (*Replacer) Compile ¶

func (r *Replacer) Compile()

Compile compiles the rules. Required before using the Replace functions

func (*Replacer) RemoveRule ¶

func (r *Replacer) RemoveRule(ignore []string)

RemoveRule deletes existings rules. TODO: make inplace to save memory

func (*Replacer) Replace ¶

func (r *Replacer) Replace(input string) (string, []Diff)

Replace is corrects misspellings in input, returning corrected version

along with a list of diffs.

func (*Replacer) ReplaceReader ¶

func (r *Replacer) ReplaceReader(raw io.Reader, w io.Writer, next func(Diff)) error

ReplaceReader applies spelling corrections to a reader stream. Diffs are emitted through a callback.

type WordCase ¶

type WordCase int

WordCase is an enum of various word casing styles

const (
	AllLower WordCase = iota
	AllUpper
	Title
	Mixed
	Camel
)

Various WordCase types.. likely to be not correct

func CaseStyle ¶

func CaseStyle(word string) WordCase

CaseStyle returns what case style a word is in

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
stringreplacer

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

install with go get -u github.com/client9/misspell/cmd/misspell

FAQ

How can I make the corrections automatically?

How do I convert British spellings to American (or vice-versa)?

How do you check an entire folder recursively?

Can I use pipes or stdin for input?

Are there special rules for golang source files?

Can I check only-comments in other other programming languages?

Does this work with gometalinter?

How Can I Get CSV Output?

How can I export to SQLite3?

How can I change the output format?

What problem does this solve?

What are other misspelling correctors and what's wrong with them?

How fast is it?

What problems does it have?

It's making mistakes. How can I debug?

Why is it making mistakes or missing items in golang files?

What license is this?

Where do the word lists come from?

Github Emoji Test 👿

Documentation ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func CaseVariations ¶

func ReadTextFile ¶

func RemoveEmail ¶

func RemoveHost ¶

func RemoveNotWords ¶

func RemovePath ¶

func StripURL ¶

Types ¶

type Diff ¶

type Replacer ¶

func New ¶

func (*Replacer) AddRuleList ¶

func (*Replacer) Compile ¶

func (*Replacer) RemoveRule ¶

func (*Replacer) Replace ¶

func (*Replacer) ReplaceReader ¶

type WordCase ¶

func CaseStyle ¶

Source Files ¶

Directories ¶

install with `go get -u github.com/client9/misspell/cmd/misspell`

Can I use pipes or `stdin` for input?