ngrams

package
v0.0.0-...-d399f25 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 17, 2020 License: MIT Imports: 10 Imported by: 0

README

n-gram frequencies

Frequency files sourced from Practical Cryptography.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	AlphaDA = "ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅ"
	AlphaDE = "ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜß"
	AlphaEN = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
	AlphaES = "ABCDEFGHIJKLMNOPQRSTUVWXYZÑ"
	AlphaFI = "ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ"
	AlphaFR = "AÀÂÆBCÇDEÉÈÊËFGHIÎÏJKLMNOÔŒPQRSTUÙÛÜVWXYŸZ"
	AlphaIS = "AÁBDÐEÉFGHIÍJKLMNOÓPRSTUÚVXYÝÞÆÖ"
	AlphaPL = "AĄBCĆDEĘFGHIJKLŁMNŃOÓPRSŚTUWYZŹŻ"
	AlphaRU = "АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ"
	AlphaSV = "ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ"
)

Alphabets for various languages.

View Source
var (
	FreqsDA = freqsDA
	FreqsDE = freqsDE
	FreqsEN = freqsEN
	FreqsES = freqsES
	FreqsFI = freqsFI
	FreqsFR = freqsFR
	FreqsIS = freqsIS
	FreqsPL = freqsPL
	FreqsRU = freqsRU
	FreqsSV = freqsSV
)

Letter frequencies for various languages. Positions correspond to letters in the alphabet.

View Source
var Langs = []Lang{
	{"da", AlphaDA, FreqsDA},
	{"de", AlphaDE, FreqsDE},
	{"en", AlphaEN, FreqsEN},
	{"es", AlphaES, FreqsES},
	{"fi", AlphaFI, FreqsFI},
	{"fr", AlphaFR, FreqsFR},
	{"is", AlphaIS, FreqsIS},
	{"pl", AlphaPL, FreqsPL},
	{"ru", AlphaRU, FreqsRU},
	{"sv", AlphaSV, FreqsSV},
}

Langs is the set of all recognized languages.

Functions

func IC

func IC(text string) float64

IC - Calculates index of coincidence for ciphertext

func NgramCount

func NgramCount(text string, n int) map[string]int

NgramCount - Returns a map containing each ngram and how many times it occurred unigrams (letters), bigrams (letter pairs), trigrams, quadgrams, quintgrams, etc.

func NgramFreq

func NgramFreq(text string, n int, floor float64) map[string]float64

NgramFreq - Returns the n-gram frequencies of all n-grams encountered in text. Standard probabilities. Only n-grams occurring in 'text' will have probabilities. For the probability of not-occurring n-grams, use freq["floor"]. This is set to floor/len(text)

func NgramFreqLog

func NgramFreqLog(text string, n int, floor float64) map[string]float64

NgramFreqLog - Log probabilities

Types

type Lang

type Lang struct {
	Name  string
	Alpha string
	Freqs []float64
}

Lang is a language with an alphabet and letter frequencies.

type NgramSet

type NgramSet struct {
	// contains filtered or unexported fields
}

func LoadNgrams

func LoadNgrams(alpha string, n int, r io.Reader) (*NgramSet, error)

LoadNgrams reads and parses all n-grams from a reader.

func LoadNgramsFile

func LoadNgramsFile(language, alpha string, n int) (*NgramSet, error)

LoadNgramsFile reads and parses all n-grams from a file.

func NewNgramSet

func NewNgramSet(alpha string, n int) *NgramSet

func ReadNgramSet

func ReadNgramSet(r io.Reader) (*NgramSet, error)

func (*NgramSet) Add

func (set *NgramSet) Add(ngram string, freq float64) error

func (*NgramSet) Get

func (set *NgramSet) Get(ngram string) (float64, error)

func (*NgramSet) WriteBinary

func (set *NgramSet) WriteBinary(w io.Writer) error

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL