Documentation
¶
Overview ¶
Package indep contains independence test algorithms (e.g. G-Test and Pearson's).
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ChiSquare ¶
ChiSquare returns the cumulative distribution function at point chi, that is:
Pr(X^2 <= chi)
Where X^2 is the chi-square distribution X^2(df), with df being the degree of freedom.
func ChiSquareTest ¶
ChiSquareTest returns whether variable x and y are statistically independent. We use the Chi-Square test to find correlations between the two variables. Argument data is a table with the counting of each variable category, where the first axis is the counting of each category of variable x and the second axis of variable y. The last element of each row and column is the total counting. E.g.:
+------------------------+ | X_1 X_2 X_3 total | | Y_1 100 200 100 400 | | Y_2 50 300 25 375 | |total 150 500 125 775 | +------------------------+
Argument p is the number of categories (or levels) in x.
Argument q is the number of categories (or levels) in y.
Returns true if independent and false otherwise.
func Chisquare ¶
Chisquare returns the p-value of Pr(X^2 > cv). Compare this value to the significance level assumed. If chisquare < sigval, then we cannot accept the null hypothesis and thus the two variables are dependent.
Thanks to Jacob F. W. for a tutorial on chi-square distributions. Source: http://www.codeproject.com/Articles/432194/How-to-Calculate-the-Chi-Squared-P-Value
Types ¶
type Graph ¶
type Graph struct { // This k-set contains the connected subgraphs that are completely separated from each other. Kset [][]int // contains filtered or unexported fields }
Graph represents an independence graph.
An independence graph is an undirected graph that maps the (in)dependencies of a set of variable. Let X={X_1,...,X_n} be the set of variables. We define an independence graph as an undirected graph G=(X, E) where there exists an edge between a pair of vertices u,v in X iff there exists a dependency between variables u and v. That is, if two variables are dependent than there exists an edge between them. Otherwise there is no such edge.
The resulting graph after such construction is a graph with clusters of connected graphs. Let H_1 and H_2 be two complete subgraphs in G. Then there exists no edge between any one vertex in H_1 and another in H_2. This constitutes an independence relation between these subgraphs. Thus we say that sets of variables in H_1 are independent of sets of variables in H_2. We now show why this is correct. Consider the following example (it can be extended to the general case easily):
Let X, Y and Z be variables. We will denote the symbol ~ as a dependency relation. That is, X ~ Y means that X is dependent of Y. Consider the case where X ~ Y. Then there exists an edge between X and Y. If Z is independent of both, then Y is disconnected from X-Y. The converse holds, since if there exists no edge between them they are independent. Now consider X ~ Y and Y ~ Z. Since X-Y, Y-Z and therefore the graph is connected. The last case is when everyone is independent of everyone, in which case there are no edges and all variables are disconnected. We can assume X, Y and Z as sets of variables for the general case.
To construct the graph, we can check for dependencies on each distinct pair of variables (u,v) of set X. If there exists a dependency, add an edge u-v. Else, skip. It is clear that the complexity for constructing such graph is O(n^2), since we must check each possible pairwise combination.
Once we have a constructed independence graph we must now discriminate each complete subgraph in the independence graph. We can do this by utils.Union-utils.Find.
Initially each vertex has its own set. For each vertex v: For each edge v-u: If u is not in the same set of v then utils.Union(u, v) EndIf EndFor EndFor
After passing through every vertex, we have k connected subgraphs. These k subgraphs are indepedent of each other. Return these k-sets.
func NewIndepGraph ¶
NewIndepGraph constructs a new Graph given a DataGroup.