words

package module
v1.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 14, 2024 License: Apache-2.0 Imports: 5 Imported by: 1

README

godoc codecov Go Report Card

words - word metrics from mixed-locale content

words is a package for counting the numbers of words and providing simple metrics like estimated reading times.

Installation

> go get github.com/go-corelibs/words@latest

Examples

Count

func main() {
    text := "さらに「やり遂げる」ためのEnjin"
    count := words.Count(text)
    // count == 12
    fmt.Printf("There are %d words in %q\n", count, text)
}

Go-CoreLibs

Go-CoreLibs is a repository of shared code between the Go-Curses and Go-Enjin projects.

License

Copyright 2023 The Go-CoreLibs Authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use file except in compliance with the License.
You may obtain a copy of the license at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Documentation

Overview

Package words provides a means for counting the numbers of words and estimating the reading time of the given content.

Words per minute values based on:

https://irisreading.com/what-is-the-average-reading-speed/
https://www.researchgate.net/publication/332380784_How_many_words_do_we_read_per_minute_A_review_and_meta-analysis_of_reading_rate
https://www.sciencedirect.com/science/article/abs/pii/S0749596X19300786

Package words was inspired by:

https://github.com/byn9826/words-count/blob/master/src/globalWordsCount.js

Index

Constants

View Source
const (
	// AverageWordsPerMinute is the words per minute read by average adults
	AverageWordsPerMinute = 238.0
	// RelaxedWordsPerMinute is an estimation of words per minute read by
	// older children and tired adults looking at monitors and screens all day
	RelaxedWordsPerMinute = 177.0
)

Variables

View Source
var (
	// DefaultPunctuation is a hard-coded list of the most common characters
	// that are not counted as words
	DefaultPunctuation = []rune{
		',', ',', '.', '。', ':', ':', ';', ';', '[', ']', '【', ']', '】', '{', '{', '}', '}',
		'(', '(', ')', ')', '<', '《', '>', '》', '$', '¥', '!', '!', '?', '?', '~', '~',
		'「', '」',
		'\'', '’', '"', '“', '”',
		'*', '/', '\\', '&', '%', '@', '#', '^', '、', '、', '、', '、',
	}
)

Functions

func Count

func Count(input string) (count int)

Count returns the total number of parsed words using the Default Words configuration

func List

func List(input string) (list []string)

List returns a list of words that were separated by spaces using the Default Words configuration

func Parse

func Parse(input string) (words []string)

Parse returns the list of parsed words using the Default Words configuration

func Range

func Range(input string, fn func(word string))

Range iterates over the list of parsed words using the Default Words configuration

func Search(query, content string) (score int, present []string)

Search performs a very simple keyword search of the content using the Default Words configuration

Types

type ReadingMetrics

type ReadingMetrics struct {
	WordCount int
	Average   struct {
		Minutes  int
		Duration time.Duration
	}
	Relaxed struct {
		Minutes  int
		Duration time.Duration
	}
}

ReadingMetrics is a data structure returned by the Words.Metrics method

func Metrics

func Metrics(content string) (m ReadingMetrics)

Metrics parses the contents and returns some interesting ReadingMetrics using the Default Words configuration

type Words

type Words struct {
	// PunctuationAsBreaker specifies that punctuation characters should not
	// be removed and instead be replaced with a space. For example, "they're"
	// by default is collapsed to "theyre" for counting purposes. With
	// PunctuationAsBreaker set to true, "they're" would become "they re"
	PunctuationAsBreaker bool
	// DisableDefaultPunctuation specifies that only the Words.Punctuation
	// runes are to be considered punctuation
	DisableDefaultPunctuation bool
	// Punctuation defines the list of punctuation runes to use when parsing
	// words out of content
	Punctuation []rune
	// AverageWPM specifies the average words per minute to use
	// for calculating Metrics, default is 238.0, see: AverageWordsPerMinute
	AverageWPM float64
	// RelaxedWPM specifies the average words per minute to use
	// for calculating Metrics, default is 177.0, see: RelaxedWordsPerMinute
	RelaxedWPM float64
	// contains filtered or unexported fields
}

Words is the definition for running customized word operations and is the implementation driving the normal package functions

func Default

func Default() (w *Words)

Default returns a new Words instance configured with sane defaults

func (*Words) Count

func (w *Words) Count(input string) (count int)

Count returns the total number of words detected within the given input

func (*Words) List

func (w *Words) List(input string) (list []string)

List returns a list of all the words detected within the given input that are separated by spaces, word characters not separated by spaces are clumped within individual items of the list returned. Use Words.Parse to derive a more accurate word list

func (*Words) Metrics

func (w *Words) Metrics(content string) (m ReadingMetrics)

Metrics gets the Words.Count and derives some estimated reading times

func (*Words) Parse

func (w *Words) Parse(input string) (words []string)

Parse returns the total list of words detected within the given input

func (*Words) Range

func (w *Words) Range(input string, fn func(word string))

Range iterates over all words detected within input, calling the given `fn` for each word found

func (*Words) Search

func (w *Words) Search(query, content string) (score int, found []string)

Search performs a case-insensitive search for the keywords within the given `query` string and returns the list of unique query keywords found along with a simple scoring metric weighing earlier keywords more than later keywords

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL