misspell

package module
v0.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 7, 2024 License: MIT Imports: 12 Imported by: 22

README

Main Go Report Card Go Reference license

Correct commonly misspelled English words... quickly.

Install

If you just want a binary and to start using misspell:

curl -sfL https://raw.githubusercontent.com/golangci/misspell/master/install-misspell.sh | sh -s -- -b ./bin ${MISSPELL_VERSION}

Both will install as ./bin/misspell.
You can adjust the download location using the -b flag.
File a ticket if you want another platform supported.

If you use Go, the best way to run misspell is by using golangci-lint.
Otherwise, install misspell the old-fashioned way:

go install github.com/golangci/misspell/cmd/misspell@latest

Also, if you like to live dangerously, one could do

curl -sfL https://raw.githubusercontent.com/golangci/misspell/master/install-misspell.sh | sh -s -- -b $(go env GOPATH)/bin ${MISSPELL_VERSION}

Usage

$ misspell all.html your.txt important.md files.go
your.txt:42:10 found "langauge" a misspelling of "language"

# ^ file, line, column
$ misspell -help
Usage of misspell:
  -debug
        Debug matching, very slow
  -dict string
        User defined corrections file path (.csv). CSV format: typo,fix
  -error
        Exit with 2 if misspelling found
  -f string
        'csv', 'sqlite3' or custom Golang template for output
  -i string
        ignore the following corrections, comma-separated
  -j int
        Number of workers, 0 = number of CPUs
  -legal
        Show legal information and exit
  -locale string
        Correct spellings using locale preferences for US or UK.  Default is to use a neutral variety of English.  Setting locale to US will correct the British spelling of 'colour' to 'color'
  -o string
        output file or [stderr|stdout|] (default "stdout")
  -q    Do not emit misspelling output
  -source string
        Source mode: text (default), go (comments only) (default "text")
  -v    Show version and exit
  -w    Overwrite file with corrections (default is just to display)

Pre-commit hook

To use misspell with pre-commit, add the following to your .pre-commit-config.yaml:

- repo: https://github.com/golangci/misspell
  rev: v0.6.0
  hooks:
    - id: misspell
      # The hook will run on all files by default.
      # To limit to some files only, use pre-commit patterns/types
      # files: <pattern>
      # exclude: <pattern>
      # types: <types>

FAQ

How can I make the corrections automatically?

Just add the -w flag!

$ misspell -w all.html your.txt important.md files.go
your.txt:9:21:corrected "langauge" to "language"

# ^ File is rewritten only if a misspelling is found

How do I convert British spellings to American (or vice-versa)?

Add the -locale US flag!

$ misspell -locale US important.txt
important.txt:10:20 found "colour" a misspelling of "color"

Add the -locale UK flag!

$ echo "My favorite color is blue" | misspell -locale UK
stdin:1:3:found "favorite color" a misspelling of "favourite colour"

Help is appreciated as I'm neither British nor an expert in the English language.

How do you check an entire folder recursively?

Just list a directory you'd like to check

misspell .
misspell aDirectory anotherDirectory aFile

You can also run misspell recursively using the following shell tricks:

misspell directory/**/*

or

find . -type f | xargs misspell

You can select a type of file as well.
The following examples selects all .txt files that are not in the vendor directory:

find . -type f -name '*.txt' | grep -v vendor/ | xargs misspell -error

Can I use pipes or stdin for input?

Yes!

Print messages to stderr only:

$ echo "zeebra" | misspell
stdin:1:0:found "zeebra" a misspelling of "zebra"

Print messages to stderr, and corrected text to stdout:

$ echo "zeebra" | misspell -w
stdin:1:0:corrected "zeebra" to "zebra"
zebra

Only print the corrected text to stdout:

$ echo "zeebra" | misspell -w -q
zebra

Are there special rules for golang source files?

yes, if you want to force a file to be checked as a golang source, use -source=go on the command line.
Conversely, you can check a golang source as if it were pure text by using -source=text.
You might want to do this since many variable names have misspellings in them!

Can I check only-comments in other programming languages?

I'm told the using -source=go works well for Ruby, Javascript, Java, C and C++.

It doesn't work well for Python and Bash.

How Can I Get CSV Output?

Using -f csv, the output is standard comma-seprated values with headers in the first row.

$ misspell -f csv *
file,line,column,typo,corrected
"README.md",9,22,langauge,language
"README.md",47,25,langauge,language

How can I export to SQLite3?

Using -f sqlite, the output is a sqlite3 dump-file.

$ misspell -f sqlite * > /tmp/misspell.sql
$ cat /tmp/misspell.sql

PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE misspell(
  "file" TEXT,
  "line" INTEGER,i
  "column" INTEGER,i
  "typo" TEXT,
  "corrected" TEXT
);
INSERT INTO misspell VALUES("install.txt",202,31,"immediatly","immediately");
# etc...
COMMIT;
$ sqlite3 -init /tmp/misspell.sql :memory: 'select count(*) from misspell'
1

With some tricks you can directly pipe output to sqlite3 by using -init /dev/stdin:

misspell -f sqlite * | sqlite3 -init /dev/stdin -column -cmd '.width 60 15' ':memory' \
    'select substr(file,35),typo,count(*) as count from misspell group by file, typo order by count desc;'

How can I ignore rules?

Using the -i "comma,separated,rules" flag you can specify corrections to ignore.

For example, if you were to run misspell -w -error -source=text against document that contains the string Guy Finkelshteyn Braswell, misspell would change the text to Guy Finkelstheyn Bras well.
You can then determine the rules to ignore by reverting the change and running the with the -debug flag.
You can then see that the corrections were htey -> they and aswell -> as well. To ignore these two rules, you add -i "htey,aswell" to your command. With debug mode on, you can see it print the corrections, but it will no longer make them.

How can I change the output format?

Using the -f template flag you can pass in a golang text template to format the output.

One can use printf "%q" VALUE to safely quote a value.

The default template:

{{ .Filename }}:{{ .Line }}:{{ .Column }}:corrected {{ printf "%q" .Original }} to "{{ printf "%q" .Corrected }}"

To just print probable misspellings:

-f '{{ .Original }}'

What problem does this solve?

This corrects commonly misspelled English words in computer source code, and other text-based formats (.txt, .md, etc.).

It is designed to run quickly, so it can be used as a pre-commit hook with minimal burden on the developer.

It does not work with binary formats (e.g. Word, etc.).

It is not a complete spell-checking program nor a grammar checker.

What are other misspelling correctors and what's wrong with them?

Some other misspelling correctors:

They all work but had problems that prevented me from using them at scale:

  • slow, all of the above check one misspelling at a time (i.e. linear) using regexps
  • not MIT/Apache2 licensed (or equivalent)
  • have dependencies that don't work for me (python3, bash, linux sed, etc.)
  • don't understand American vs. British English and sometimes makes unwelcome "corrections"

That said, they might be perfect for you and many have more features than this project!

How fast is it?

Misspell is easily 100x to 1000x faster than other spelling correctors.
You should be able to check and correct 1000 files in under 250ms.

This uses the mighty power of golang's strings.Replacer which is an implementation or variation of the Aho–Corasick algorithm. This makes multiple substring matches simultaneously.

It also uses multiple CPU cores to work on multiple files concurrently.

What problems does it have?

Unlike the other projects, this doesn't know what a "word" is.
There may be more false positives and false negatives due to this.
On the other hand, it sometimes catches things others don't.

Either way, please file bugs and we'll fix them!

Since it operates in parallel to make corrections, it can be non-obvious to determine exactly what word was corrected.

It's making mistakes. How can I debug?

Run using -debug flag on the file you want.
It should then print what word it is trying to correct.
Then file a bug describing the problem. Thanks!

Why is it making mistakes or missing items in golang files?

The matching function is case-sensitive, so variable names that are multiple worlds either in all-uppercase or all-lowercase case sometimes can cause false positives.
For instance a variable named bodyreader could trigger a false positive since yrea is in the middle that could be corrected to year. Other problems happen if the variable name uses an English contraction that should use an apostrophe.
The best way of fixing this is to use the Effective Go naming conventions and use camelCase for variable names.
You can check your code using golint

What license is this?

The main code is MIT.

Misspell also makes uses of the Golang standard library and contains a modified version of Golang's strings.Replacer which is covered under a BSD License.
Type misspell -legal for more details or see legal.go

Where do the word lists come from?

It started with a word list from Wikipedia. Unfortunately, this list had to be highly edited as many of the words are obsolete or based on mistakes on mechanical typewriters (I'm guessing).

Additional words were added based on actually mistakes seen in the wild (meaning self-generated).

Variations of UK and US spellings are based on many sources including:

American English is more accepting of spelling variations than is British English, so "what is American or not" is subject to opinion. Corrections and help welcome.

What are some other enhancements that could be done?

Here are some ideas for enhancements:

Capitalization of proper nouns could be done (e.g. weekday and month names, country names, language names)

Opinionated US spellings US English has a number of words with alternate spellings.
Think adviser vs. advisor.
While "advisor" is not wrong, the opinionated US locale would correct "advisor" to "adviser".

Versioning Some type of versioning is needed so reporting mistakes and errors is easier.

Feedback Mistakes would be sent to some server for aggregation and feedback review.

Contractions and Apostrophes This would optionally correct "isnt" to "isn't", etc.

Documentation

Overview

Package misspell corrects commonly misspelled English words in source files.

Index

Constants

View Source
const Legal = `` /* 2040-byte string literal not displayed */

Legal provides licensing info.

Variables

View Source
var DictAmerican = []string{}/* 3238 elements not displayed */

DictAmerican converts UK spellings to US spellings

View Source
var DictBritish = []string{}/* 2954 elements not displayed */

DictBritish converts US spellings to UK spellings

View Source
var DictMain = []string{}/* 56166 elements not displayed */

DictMain is the main rule set, not including locale-specific spellings

Functions

func ByteEqualFold added in v0.2.0

func ByteEqualFold(a, b byte) bool

ByteEqualFold does ascii compare, case insensitive.

func ByteToLower added in v0.2.0

func ByteToLower(eax byte) byte

ByteToLower converts an ascii byte to lower case. Uses a branch-less algorithm.

func ByteToUpper added in v0.2.0

func ByteToUpper(x byte) byte

ByteToUpper converts an ascii byte to upper cases. Uses a branch-less algorithm.

func CaseVariations

func CaseVariations(word string, style WordCase) []string

CaseVariations returns: If AllUpper or First-Letter-Only is upper-cased: add the all upper case version. If AllLower, add the original, the title and upper-case forms. If Mixed, return the original, and the all upper-case form.

func ReadTextFile

func ReadTextFile(filename string) (string, error)

ReadTextFile returns the contents of a file, first testing if it is a text file:

returns ("", nil) if not a text file
returns ("", error) if error
returns (string, nil) if text

unfortunately, in worse case, this does:

 1 stat
 1 open,read,close of 512 bytes
 1 more stat,open, read everything, close (via io.ReadAll)
This could be kinder to the filesystem.

This uses some heuristics of the file's extension (e.g. .zip, .txt) and uses a sniffer to determine if the file is text or not. Using file extensions isn't great, but probably good enough for real-world use. Golang's built-in sniffer is problematic for different reasons. It's optimized for HTML, and is very limited in detection. It would be good to explicitly add some tests for ELF/DWARF formats to make sure we never corrupt binary files.

func RemoveEmail

func RemoveEmail(s string) string

RemoveEmail remove email-like strings, e.g. "nickg+junk@xfoobar.com", "nickg@xyz.abc123.biz".

func RemoveHost

func RemoveHost(s string) string

RemoveHost removes host-like strings "foobar.com" "abc123.fo1231.biz".

func RemoveNotWords

func RemoveNotWords(s string) string

RemoveNotWords blanks out all the not words.

func RemovePath

func RemovePath(s string) string

RemovePath attempts to strip away embedded file system paths, e.g.

/foo/bar or /static/myimg.png

TODO: windows style.

func StringEqualFold added in v0.2.0

func StringEqualFold(s1, s2 string) bool

StringEqualFold ASCII case-insensitive comparison golang toUpper/toLower for both bytes and strings appears to be Unicode based which is super slow based from https://codereview.appspot.com/5180044/patch/14007/21002.

func StringHasPrefixFold added in v0.2.0

func StringHasPrefixFold(s1, s2 string) bool

StringHasPrefixFold is similar to strings.HasPrefix but comparison is done ignoring ASCII case.

func StripURL

func StripURL(s string) string

StripURL attempts to replace URLs with blank spaces, e.g.

"xxx http://foo.com/ yyy -> "xxx          yyyy".

Types

type Diff

type Diff struct {
	Filename  string
	FullLine  string
	Line      int
	Column    int
	Original  string
	Corrected string
}

Diff is datastructures showing what changed in a single line.

type Replacer

type Replacer struct {
	Replacements []string
	Debug        bool
	// contains filtered or unexported fields
}

Replacer is the main struct for spelling correction.

func New

func New() *Replacer

New creates a new default Replacer using the main rule list.

func (*Replacer) AddRuleList

func (r *Replacer) AddRuleList(additions []string)

AddRuleList appends new rules. Input is in the same form as Strings.Replacer: [ old1, new1, old2, new2, ....] Note: does not check for duplicates.

func (*Replacer) Compile

func (r *Replacer) Compile()

Compile compiles the rules. Required before using the Replace functions.

func (*Replacer) RemoveRule

func (r *Replacer) RemoveRule(ignore []string)

RemoveRule deletes existing rules. The content of `ignore` is case-insensitive. TODO: make in place to save memory.

func (*Replacer) Replace

func (r *Replacer) Replace(input string) (string, []Diff)

Replace is correcting misspellings in input, returning corrected version along with a list of diffs.

func (*Replacer) ReplaceGo

func (r *Replacer) ReplaceGo(input string) (string, []Diff)

ReplaceGo is a specialized routine for correcting Golang source files. Currently only checks comments, not identifiers for spelling.

func (*Replacer) ReplaceReader

func (r *Replacer) ReplaceReader(raw io.Reader, w io.Writer, next func(Diff)) error

ReplaceReader applies spelling corrections to a reader stream. Diffs are emitted through a callback.

type StringReplacer added in v0.2.0

type StringReplacer struct {
	// contains filtered or unexported fields
}

StringReplacer replaces a list of strings with replacements. It is safe for concurrent use by multiple goroutines.

func NewStringReplacer added in v0.2.0

func NewStringReplacer(oldnew ...string) *StringReplacer

NewStringReplacer returns a new Replacer from a list of old, new string pairs. Replacements are performed in order, without overlapping matches.

func (*StringReplacer) Replace added in v0.2.0

func (r *StringReplacer) Replace(s string) string

Replace returns a copy of s with all replacements performed.

func (*StringReplacer) WriteString added in v0.2.0

func (r *StringReplacer) WriteString(w io.Writer, s string) (int, error)

WriteString writes s to w with all replacements performed.

type WordCase

type WordCase int

WordCase is an enum of various word casing styles.

const (
	CaseUnknown WordCase = iota
	CaseLower
	CaseUpper
	CaseTitle
)

Various WordCase types... likely to be not correct.

func CaseStyle

func CaseStyle(word string) WordCase

CaseStyle returns what case style a word is in.

Directories

Path Synopsis
cmd
misspell
The misspell command corrects commonly misspelled English words in source files.
The misspell command corrects commonly misspelled English words in source files.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL