Documentation ¶
Overview ¶
Package feature contain list of feature implementation to compute vandalism in wikipedia dataset.
Index ¶
- Constants
- Variables
- func ComputeImpact(oldrevid, newrevid string, wordlist []string) float64
- func GetAllWordList() (allWords []string)
- func KullbackLeiblerDivergence(a, b string) (divergence float64)
- func Register(ftr Interface, tipe int, name string)
- func Round(v float64) float64
- type Anonim
- type CharDistributionInsert
- type CharDiversity
- type Class
- type CommentLength
- type CompressRate
- type DigitRatio
- type Feature
- type GoodToken
- type Interface
- type LongestCharSeq
- type LongestWord
- type NonAlnumRatio
- type SizeIncrement
- type SizeRatio
- type Template
- type TermFrequency
- type UpperLowerRatio
- type UpperToAllRatio
- type WordsAllFrequency
- type WordsAllImpact
- type WordsBadFrequency
- type WordsBadImpact
- type WordsBiasFrequency
- type WordsBiasImpact
- type WordsPronounFrequency
- type WordsPronounImpact
- type WordsSexFrequency
- type WordsSexImpact
- type WordsVulgarFrequency
- type WordsVulgarImpact
Constants ¶
const ( // RoundDigit define maximum digit for rounding float value. RoundDigit = float64(100000) )
Variables ¶
var (
// DEBUG level, set using environment FEATURE_DEBUG
DEBUG = 0
)
var ListFeature []Interface
ListFeature is a global variables which contain all implemented features.
Functions ¶
func ComputeImpact ¶
ComputeImpact return percentage of words in new revision compared to old revision, using
count_of_words_in_new / (count_of_words_in_old + count_of_words_in_new)
if no words are found in old and new revision, return 0.
func GetAllWordList ¶
func GetAllWordList() (allWords []string)
GetAllWordList return all categorical words used in language based features.
func KullbackLeiblerDivergence ¶
KullbackLeiblerDivergence comput and return the divergence of two string based on their character probabability.
Types ¶
type Anonim ¶
type Anonim Feature
Anonim compute wether editor is login or from anonymous (logged by IP address).
func (*Anonim) Compute ¶
func (anon *Anonim) Compute(dataset tabula.DatasetInterface)
Compute if record in column is IP address then it is an anonim and set their value to 1, otherwise set to 0.
type CharDistributionInsert ¶
type CharDistributionInsert Feature
CharDistributionInsert measure divergence of the character distribution of the inserted text with respect to the expectation.
func (*CharDistributionInsert) Compute ¶
func (ftr *CharDistributionInsert) Compute(dataset tabula.DatasetInterface)
Compute character distribution of inserted text.
type CharDiversity ¶
type CharDiversity Feature
CharDiversity is a feature that measure of different character compared to the length of inserted text, given by expression
length^(1/differentchars)
func (*CharDiversity) Compute ¶
func (ftr *CharDiversity) Compute(dataset tabula.DatasetInterface)
Compute character diversity.
type Class ¶
type Class Feature
Class change the classification from text to numeric. The "regular" edit will become 0 and the "vandalism" will become 1.
func (*Class) Compute ¶
func (ftr *Class) Compute(dataset tabula.DatasetInterface)
Compute change the classification from text to numeric. The "regular" edit will become 0 and the "vandalism" will become 1.
type CommentLength ¶
type CommentLength Feature
CommentLength feature for compute the length of edit comment.
func (*CommentLength) Compute ¶
func (ftr *CommentLength) Compute(dataset tabula.DatasetInterface)
Compute will count number of bytes that is used in comment, NOT including the header content "/* ... */".
type CompressRate ¶
type CompressRate Feature
CompressRate is a feature that compute compression rate of inserted text.
func (*CompressRate) Compute ¶
func (ftr *CompressRate) Compute(dataset tabula.DatasetInterface)
Compute compress rate of inserted text.
type DigitRatio ¶
type DigitRatio Feature
DigitRatio is a feature that compare digit to all character.
func (*DigitRatio) Compute ¶
func (ftr *DigitRatio) Compute(dataset tabula.DatasetInterface)
Compute calculate digit ratio in new revision.
type GoodToken ¶
type GoodToken Feature
GoodToken count how many good token in inserted text.
func (*GoodToken) Compute ¶
func (ftr *GoodToken) Compute(dataset tabula.DatasetInterface)
Compute number of good token in inserted text.
type Interface ¶
type Interface interface { tabula.ColumnInterface Compute(dataset tabula.DatasetInterface) }
Interface define the methods that must be implemented by feature.
type LongestCharSeq ¶
type LongestCharSeq Feature
LongestCharSeq will compute maximum sequence of character at inserted text.
func (*LongestCharSeq) Compute ¶
func (ftr *LongestCharSeq) Compute(dataset tabula.DatasetInterface)
Compute maximum sequence of character at inserted text.
type LongestWord ¶
type LongestWord Feature
LongestWord find and return the longset word in inserted text.
func (*LongestWord) Compute ¶
func (ftr *LongestWord) Compute(dataset tabula.DatasetInterface)
Compute the longest word in inserted text.
type NonAlnumRatio ¶
type NonAlnumRatio Feature
NonAlnumRatio is a feature that compare non alpha-numeric to all character in inserted text.
func (*NonAlnumRatio) Compute ¶
func (ftr *NonAlnumRatio) Compute(dataset tabula.DatasetInterface)
Compute non-alphanumeric ratio with all character in inserted text.
type SizeIncrement ¶
type SizeIncrement Feature
SizeIncrement is a feature that compare the size of new with old revision by subtracting their length.
func (*SizeIncrement) Compute ¶
func (ftr *SizeIncrement) Compute(dataset tabula.DatasetInterface)
Compute the absolute size increment.
type SizeRatio ¶
type SizeRatio Feature
SizeRatio is a feature that compare the size ratio of new with old revision.
func (*SizeRatio) Compute ¶
func (ftr *SizeRatio) Compute(dataset tabula.DatasetInterface)
Compute ratio of size between new and old revision.
type Template ¶
type Template Feature
Template template to add new feature to this generator.
func (*Template) Compute ¶
func (ftr *Template) Compute(dataset tabula.DatasetInterface)
Compute describe what this feature do.
type TermFrequency ¶
type TermFrequency Feature
TermFrequency compute frequency of words in inserted text againts the new revision.
func (*TermFrequency) Compute ¶
func (ftr *TermFrequency) Compute(dataset tabula.DatasetInterface)
Compute the frequency of inserted words.
type UpperLowerRatio ¶
type UpperLowerRatio Feature
UpperLowerRatio is a feature that compare uppercase and lowercase characters.
func (*UpperLowerRatio) Compute ¶
func (ftr *UpperLowerRatio) Compute(dataset tabula.DatasetInterface)
Compute ratio of uppercase and lowercase in new revision.
type UpperToAllRatio ¶
type UpperToAllRatio Feature
UpperToAllRatio is a feature that compare uppercase with all characters.
func (*UpperToAllRatio) Compute ¶
func (ftr *UpperToAllRatio) Compute(dataset tabula.DatasetInterface)
Compute ratio of uppercase to all characters in new revision.
type WordsAllFrequency ¶
type WordsAllFrequency Feature
WordsAllFrequency compute vandalism, pronouns, bias, sex, and bad words in inserted text.
func (*WordsAllFrequency) Compute ¶
func (ftr *WordsAllFrequency) Compute(dataset tabula.DatasetInterface)
Compute frequency of all words.
type WordsAllImpact ¶
type WordsAllImpact Feature
WordsAllImpact will compute the impact of vulgar, pronoun, bias, sex, and bad words between old and new revision.
func (*WordsAllImpact) Compute ¶
func (ftr *WordsAllImpact) Compute(dataset tabula.DatasetInterface)
Compute the impact of vulgar, pronoun, bias, sex, and bad words in inserted text.
type WordsBadFrequency ¶
type WordsBadFrequency Feature
WordsBadFrequency will compute frequency of bad words, colloquial words or bad writing skill words.
func (*WordsBadFrequency) Compute ¶
func (ftr *WordsBadFrequency) Compute(dataset tabula.DatasetInterface)
Compute frequency of bad words.
type WordsBadImpact ¶
type WordsBadImpact Feature
WordsBadImpact will count frequency of bad words in inserted text.
func (*WordsBadImpact) Compute ¶
func (ftr *WordsBadImpact) Compute(dataset tabula.DatasetInterface)
Compute frequency bad words in inserted text.
type WordsBiasFrequency ¶
type WordsBiasFrequency Feature
WordsBiasFrequency will count frequency of colloquial words with high bias in inserted text.
func (*WordsBiasFrequency) Compute ¶
func (ftr *WordsBiasFrequency) Compute(dataset tabula.DatasetInterface)
Compute frequency of biased words.
type WordsBiasImpact ¶
type WordsBiasImpact Feature
WordsBiasImpact will count frequency of biased words in inserted text.
func (*WordsBiasImpact) Compute ¶
func (ftr *WordsBiasImpact) Compute(dataset tabula.DatasetInterface)
Compute frequency bias words in inserted text.
type WordsPronounFrequency ¶
type WordsPronounFrequency Feature
WordsPronounFrequency will count frequency of first and second person pronoun in inserted text.
func (*WordsPronounFrequency) Compute ¶
func (ftr *WordsPronounFrequency) Compute(dataset tabula.DatasetInterface)
Compute frequency of pronoun words in inserted text.
type WordsPronounImpact ¶
type WordsPronounImpact Feature
WordsPronounImpact will count frequency of pronoun words in inserted text.
func (*WordsPronounImpact) Compute ¶
func (ftr *WordsPronounImpact) Compute(dataset tabula.DatasetInterface)
Compute frequency pronoun words in inserted text.
type WordsSexFrequency ¶
type WordsSexFrequency Feature
WordsSexFrequency will count frequency of non-vulgar, sex-related words.
func (*WordsSexFrequency) Compute ¶
func (ftr *WordsSexFrequency) Compute(dataset tabula.DatasetInterface)
Compute frequency of sex related words.
type WordsSexImpact ¶
type WordsSexImpact Feature
WordsSexImpact will count frequency of sex words in inserted text.
func (*WordsSexImpact) Compute ¶
func (ftr *WordsSexImpact) Compute(dataset tabula.DatasetInterface)
Compute frequency sex words in inserted text.
type WordsVulgarFrequency ¶
type WordsVulgarFrequency Feature
WordsVulgarFrequency will count frequency of vulgar words in inserted text.
func (*WordsVulgarFrequency) Compute ¶
func (ftr *WordsVulgarFrequency) Compute(dataset tabula.DatasetInterface)
Compute frequency vulgar words in inserted text.
type WordsVulgarImpact ¶
type WordsVulgarImpact Feature
WordsVulgarImpact will count frequency of vulgar words in inserted text.
func (*WordsVulgarImpact) Compute ¶
func (ftr *WordsVulgarImpact) Compute(dataset tabula.DatasetInterface)
Compute frequency vulgar words in inserted text.
Source Files ¶
- algorithm.go
- anonim.go
- char_distribution_insert.go
- char_diversity.go
- class.go
- comment_length.go
- compress_rate.go
- digit_ratio.go
- feature.go
- good_token.go
- interface.go
- longest_char_sequence.go
- longest_word.go
- non_alnum_ratio.go
- size_increment.go
- size_ratio.go
- template.go
- term_frequency.go
- upper_lower_ratio.go
- upper_to_all_ratio.go
- util.go
- words_all_frequency.go
- words_all_impact.go
- words_bad_frequency.go
- words_bad_impact.go
- words_bias_frequency.go
- words_bias_impact.go
- words_pronoun_frequency.go
- words_pronoun_impact.go
- words_sex_frequency.go
- words_sex_impact.go
- words_vulgar_frequency.go
- words_vulgar_impact.go