stringentropy

package

v0.0.0-...-c4af43d Latest Latest Go to latest Published: Apr 9, 2024 License: Apache-2.0 Imports: 2 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ossf/package-analysis

Links

Open Source Insights

Documentation ¶

Index ¶

func Calculate(s string, prob map[rune]float64) float64
func CalculateNormalised(s string, prob map[rune]float64) float64
func CharacterCounts(strs []string) (map[rune]int, int64)
func CharacterProbabilities(strs []string) map[rune]float64

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Calculate ¶

func Calculate(s string, prob map[rune]float64) float64

Calculate finds the entropy of a string S of characters over an alphabet A, which is defined as

E(S) = - sum(i in A) { (p(i)) * log(p(i)) },

where p(i) is the probability of observing character i, and the summation is performed over all characters in A. If S is the empty string, we define E(S) to be 0.

The probabilities p(i) can be given a priori, or simply calculated by counting characters within the string S. In the latter case, we have p(i) = c(i) / |S|, where c(i) counts the number of times character i appears in S, and |S| is the length of S. Then,

E(S) = - sum(i in A) { (c(i) / |S|) * log(c(i) / |S|) }.

In this case, the maximum value for E is log(|S|). When the number of distinct characters in S is small, the entropy approaches 0.

Reference: https://link.springer.com/chapter/10.1007/978-3-642-10509-8_19

func CalculateNormalised ¶

func CalculateNormalised(s string, prob map[rune]float64) float64

CalculateNormalised returns the string entropy normalised by the log of the length of the string. This quantity is used because for log(N) is the maximum possible entropy out of all strings with length N, where N > 0. Special cases are empty strings (0) and single character strings (1). As a formula:

E_n(S) := {
    0,               if |S| = 0
    1,               if |S| = 1
    E(S) / log(|S|), otherwise
}

TODO does this make sense when a general probability structure is used? TODO calculate max string entropy for a given set of character counts.

func CharacterCounts ¶

func CharacterCounts(strs []string) (map[rune]int, int64)

CharacterCounts computes a map of character (rune) to number of occurrences in the input strings

func CharacterProbabilities ¶

func CharacterProbabilities(strs []string) map[rune]float64

CharacterProbabilities computes a map of character (rune) to frequency/probability of occurrence in the input strings

Types ¶

This section is empty.

Source Files ¶

View all Source files

string_entropy.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL