stringentropy

package
v0.0.0-...-c4af43d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 9, 2024 License: Apache-2.0 Imports: 2 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Calculate

func Calculate(s string, prob map[rune]float64) float64

Calculate finds the entropy of a string S of characters over an alphabet A, which is defined as

E(S) = - sum(i in A) { (p(i)) * log(p(i)) },

where p(i) is the probability of observing character i, and the summation is performed over all characters in A. If S is the empty string, we define E(S) to be 0.

The probabilities p(i) can be given a priori, or simply calculated by counting characters within the string S. In the latter case, we have p(i) = c(i) / |S|, where c(i) counts the number of times character i appears in S, and |S| is the length of S. Then,

E(S) = - sum(i in A) { (c(i) / |S|) * log(c(i) / |S|) }.

In this case, the maximum value for E is log(|S|). When the number of distinct characters in S is small, the entropy approaches 0.

Reference: https://link.springer.com/chapter/10.1007/978-3-642-10509-8_19

func CalculateNormalised

func CalculateNormalised(s string, prob map[rune]float64) float64

CalculateNormalised returns the string entropy normalised by the log of the length of the string. This quantity is used because for log(N) is the maximum possible entropy out of all strings with length N, where N > 0. Special cases are empty strings (0) and single character strings (1). As a formula:

E_n(S) := {
    0,               if |S| = 0
    1,               if |S| = 1
    E(S) / log(|S|), otherwise
}

TODO does this make sense when a general probability structure is used? TODO calculate max string entropy for a given set of character counts.

func CharacterCounts

func CharacterCounts(strs []string) (map[rune]int, int64)

CharacterCounts computes a map of character (rune) to number of occurrences in the input strings

func CharacterProbabilities

func CharacterProbabilities(strs []string) map[rune]float64

CharacterProbabilities computes a map of character (rune) to frequency/probability of occurrence in the input strings

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL