Documentation ¶
Overview ¶
Package summarize implements utilities for computing readability scores, usage statistics, and TL;DR summaries of text.
Index ¶
- func Syllables(word string) int
- type Assessment
- type Document
- func (d *Document) Assess() *Assessment
- func (d *Document) AutomatedReadability() float64
- func (d *Document) ColemanLiau() float64
- func (d *Document) DaleChall() float64
- func (d *Document) FleschKincaid() float64
- func (d *Document) FleschReadingEase() float64
- func (d *Document) GunningFog() float64
- func (d *Document) Initialize()
- func (d *Document) Keywords() map[string]int
- func (d *Document) MeanWordLength() float64
- func (d *Document) SMOG() float64
- func (d *Document) Summary(n int) []RankedParagraph
- func (d *Document) WordDensity() map[string]float64
- type RankedParagraph
- type Sentence
- type Word
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Assessment ¶
type Assessment struct { // assessments returning an estimated grade level AutomatedReadability float64 ColemanLiau float64 FleschKincaid float64 GunningFog float64 SMOG float64 // mean & standard deviation of the above estimated grade levels MeanGradeLevel float64 StdDevGradeLevel float64 // assessments returning non-grade numerical scores DaleChall float64 ReadingEase float64 }
An Assessment provides comprehensive access to a Document's metrics.
type Document ¶
type Document struct { Content string // Actual text NumCharacters float64 // Number of Characters NumComplexWords float64 // PolysylWords without common suffixes NumParagraphs float64 // Number of paragraphs NumPolysylWords float64 // Number of words with > 2 syllables NumSentences float64 // Number of sentences NumSyllables float64 // Number of syllables NumWords float64 // Number of words Sentences []Sentence // the Document's sentences WordFrequency map[string]int // [word]frequency SentenceTokenizer tokenize.ProseTokenizer WordTokenizer tokenize.ProseTokenizer }
A Document represents a collection of text to be analyzed.
A Document's calculations depend on its word and sentence tokenizers. You can use the defaults by invoking NewDocument, choose another implemention from the tokenize package, or use your own (as long as it implements the ProseTokenizer interface). For example,
d := Document{Content: ..., WordTokenizer: ..., SentenceTokenizer: ...} d.Initialize()
func NewDocument ¶
NewDocument is a Document constructor that takes a string as an argument. It then calculates the data necessary for computing readability and usage statistics.
This is a convenience wrapper around the Document initialization process that defaults to using a WordBoundaryTokenizer and a PunktSentenceTokenizer as its word and sentence tokenizers, respectively.
func (*Document) Assess ¶
func (d *Document) Assess() *Assessment
Assess returns an Assessment for the Document d.
func (*Document) AutomatedReadability ¶
AutomatedReadability computes the automated readability index score (https://en.wikipedia.org/wiki/Automated_readability_index).
func (*Document) ColemanLiau ¶
ColemanLiau computes the Coleman–Liau index score (https://en.wikipedia.org/wiki/Coleman%E2%80%93Liau_index).
func (*Document) DaleChall ¶
DaleChall computes the Dale–Chall score (https://en.wikipedia.org/wiki/Dale%E2%80%93Chall_readability_formula).
func (*Document) FleschKincaid ¶
FleschKincaid computes the Flesch–Kincaid grade level (https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests).
func (*Document) FleschReadingEase ¶
FleschReadingEase computes the Flesch reading-ease score (https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests).
func (*Document) GunningFog ¶
GunningFog computes the Gunning Fog index score (https://en.wikipedia.org/wiki/Gunning_fog_index).
func (*Document) Initialize ¶
func (d *Document) Initialize()
Initialize calculates the data necessary for computing readability and usage statistics.
func (*Document) Keywords ¶
Keywords returns a Document's words in the form
map[word]count
omitting stop words and normalizing case.
func (*Document) MeanWordLength ¶
MeanWordLength returns the mean number of characters per word.
func (*Document) SMOG ¶
SMOG computes the SMOG grade (https://en.wikipedia.org/wiki/SMOG).
func (*Document) Summary ¶
func (d *Document) Summary(n int) []RankedParagraph
Summary returns a Document's n highest ranked paragraphs according to keyword frequency.
func (*Document) WordDensity ¶
WordDensity returns a map of each word and its density.
type RankedParagraph ¶
type RankedParagraph struct { Sentences []Sentence Position int // the zero-based position within a Document Rank int }
A RankedParagraph is a paragraph ranked by its number of keywords.