textcorpora

package module
v0.0.0-...-05fab05 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 1, 2015 License: LGPL-3.0 Imports: 0 Imported by: 2

README

textcorpora
==============

[![GoDoc](https://godoc.org/github.com/ChimeraCoder/textcorpora?status.png)](https://godoc.org/github.com/ChimeraCoder/textcorpora)

TextCorpora is a helper package that provides an interface for various [corpora](https://en.wikipedia.org/wiki/Text_corpus). It was originally written for use in the [ReadingLevel](https://github.com/ChimeraCoder/readinglevel) library. It is provided as a separate package for convenience - both to faciliate use of corpora in other applications and libraries, and also to allow users of the ReadingLevel library the ability to plug in an alternative corpus if desired.


### Storage

The location for each corpus is stored in a location provided by [appdirs](github.com/Wessie/appdirs). For example, on Linux, the current version of the CMU corpus will be downloaded and saved to `~/.local/share/cmudict/.1/cmudict.0.7a.corpus`.

 

Documentation

Overview

Package textcorpora proivdes an interface for various corpora used in natural language processing.

Currently the package provides two corpora: the Carnegie Mellon Pronouncing Dictionary Corpus, and the Enron Corpus (containing over 600,000 emails from Enron employees).

The location for each corpus is stored in a location provided by appdirs. For example, on Linux, the current version of the CMU corpus will be downloaded and saved to ~/.local/share/cmudict/.1/cmudict.0.7a.corpus.

TextCorpora is a helper package that provides an interface for various corpora. It was originally written for use in the ReadingLevel library. It is provided as a separate package for convenience - both to faciliate use of corpora in other applications and libraries, and also to allow users of the ReadingLevel library the ability to plug in an alternative corpus if desired.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Corpus

type Corpus interface {
	Syllables(string) int
	Words() int
	WordsCursor() chan string
}

A Corpus is a body of text that supports certain features Currently the only required queries are Syllables (number of syllables), Words (number of words), and WordsCursor() (for iterating over the words).

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL