goaway

package module

v0.0.0-...-11a114b Latest Latest Go to latest Published: Jan 8, 2023 License: MIT Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/prunes-git/go-away

Links

Open Source Insights

README ¶

go-away

go-away is a stand-alone, lightweight library for detecting and censoring profanities in Go.

This library must remain extremely easy to use. Its original intent of not adding overhead will always remain.

Installation

go get -u github.com/TwiN/go-away

Usage

package main

import (
    "github.com/TwiN/go-away"
)

func main() {
    goaway.IsProfane("fuck this shit")                // returns true
    goaway.ExtractProfanity("fuck this shit")         // returns "fuck"
    goaway.Censor("fuck this shit")                   // returns "**** this ****"
    
    goaway.IsProfane("F   u   C  k th1$ $h!t")        // returns true
    goaway.ExtractProfanity("F   u   C  k th1$ $h!t") // returns "fuck"
    goaway.Censor("F   u   C  k th1$ $h!t")           // returns "*   *   *  * th1$ ****"
    
    goaway.IsProfane("@$$h073")                       // returns true
    goaway.ExtractProfanity("@$$h073")                // returns "asshole"
    goaway.Censor("@$$h073")                          // returns "*******"
    
    goaway.IsProfane("hello, world!")                 // returns false
    goaway.ExtractProfanity("hello, world!")          // returns ""
    goaway.Censor("hello, world!")                    // returns "hello, world!"
}

Calling goaway.IsProfane(s), goaway.ExtractProfanity(s) or goaway.Censor(s) will use the default profanity detector, but if you'd like to disable leet speak, numerical character or special character sanitization, you have to create a ProfanityDetector instead:

profanityDetector := goaway.NewProfanityDetector().WithSanitizeLeetSpeak(false).WithSanitizeSpecialCharacters(false).WithSanitizeAccents(false)
profanityDetector.IsProfane("b!tch") // returns false because we're not sanitizing special characters

By default, the NewProfanityDetector constructor uses the default dictionaries for profanities, false positives and false negatives. These dictionaries are exposed as goaway.DefaultProfanities, goaway.DefaultFalsePositives and goaway.DefaultFalseNegatives respectively.

If you need to load a different dictionary, you could create a new instance of ProfanityDetector on this way:

profanities    := []string{"ass"}
falsePositives := []string{"bass"}
falseNegatives := []string{"dumbass"}

profanityDetector := goaway.NewProfanityDetector().WithCustomDictionary(profanities, falsePositives, falseNegatives)

You may also specify custom character replacements using WithCustomCharacterReplacements on a ProfanityDetector. By default, this is set to goaway.DefaultCharacterReplacements.

Note that all character replacements with a value of ' ' are considered as special characters while all characters with a value that is not ' ' are considered to be leetspeak characters. This means that using profanityDetector.WithSanitizeSpecialCharacters(bool) and profanityDetector.WithSanitizeLeetSpeak(bool) will let you toggle which character replacements are executed during the sanitization process.

In the background

While using a giant regex query to handle everything would be a way of doing it, as more words are added to the list of profanities, that would slow down the filtering considerably.

Instead, the following steps are taken before checking for profanities in a string:

Numbers are replaced to their letter counterparts (e.g. 1 -> L, 4 -> A, etc)
Special characters are replaced to their letter equivalent (e.g. @ -> A, ! -> i)
The resulting string has all of its spaces removed to prevent w ords lik e tha t
The resulting string has all of its characters converted to lowercase
The resulting string has all words deemed as false positives (e.g. assassin) removed

In the future, the following additional steps could also be considered:

All non-transformed special characters are removed to prevent s~tring li~ke tha~~t
All words that have the same character repeated more than twice in a row are removed (e.g. poooop -> poop)
- NOTE: This is obviously not a perfect approach, as words like fuuck wouldn't be detected, but it's better than nothing.
- The upside of this method is that we only need to add base bad words, and not all tenses of said bad word. (e.g. the fuck entry would support fucker, fucking, etc.)

Documentation ¶

Index ¶

Variables
func Censor(s string) string
func ExtractProfanity(s string) string
func IsProfane(s string) bool
type ProfanityDetector
- func NewProfanityDetector() *ProfanityDetector

Constants ¶

This section is empty.

Variables ¶

View Source

var DefaultCharacterReplacements = map[rune]rune{

	'0': 'o',
	'1': 'i',
	'3': 'e',
	'4': 'a',
	'5': 's',
	'7': 'l',
	'$': 's',
	'!': 'i',
	'+': 't',
	'#': 'h',
	'@': 'a',
	'<': 'c',

	'-': ' ',
	'_': ' ',
	'|': ' ',
	'.': ' ',
	',': ' ',
	'(': ' ',
	')': ' ',
	'>': ' ',
	'"': ' ',
	'`': ' ',
	'~': ' ',
	'*': ' ',
	'&': ' ',
	'%': ' ',
	'?': ' ',
}

DefaultCharacterReplacements is the mapping of all characters that are replaced by other characters before attempting to find a profanity.

View Source

var DefaultFalseNegatives = []string{
	"asshole",
	"dumbass",
	"nigger",
}

DefaultFalseNegatives is a list of profanities that are checked for before the DefaultFalsePositives are removed

This is reserved for words that may be incorrectly filtered as false positives.

Alternatively, words that are long, or that should mark a string as profane no what the context is or whether the word is part of another word can also be included.

Note that there is a test that prevents words from being in both DefaultProfanities and DefaultFalseNegatives,

View Source

var DefaultFalsePositives = []string{
	"analy",
	"arsenal",
	"assassin",
	"assaying",
	"assert",
	"assign",
	"assimil",
	"assist",
	"associat",
	"assum",
	"assur",
	"banal",
	"basement",
	"bass",
	"butth",
	"butto",
	"butter",
	"cass",
	"canvass",
	"circum",
	"clitheroe",
	"cockburn",
	"cocktail",
	"cumber",
	"cumbing",
	"cumulat",
	"dickvandyke",
	"document",
	"evaluate",
	"exclusive",
	"expensive",
	"explain",
	"expression",
	"grape",
	"grass",
	"harass",
	"hass",
	"horniman",
	"hotwater",
	"identit",
	"kassa",
	"kassi",
	"lass",
	"leafage",
	"libshitz",
	"magnacumlaude",
	"mass",
	"mocha",
	"pass",
	"penistone",
	"phoebe",
	"phoenix",
	"pushit",
	"sassy",
	"saturday",
	"scrap",
	"serfage",
	"sexist",
	"shoe",
	"scunthorpe",
	"shitake",
	"stitch",
	"sussex",
	"therapist",
	"tysongay",
	"wass",
	"wharfage",
}

DefaultFalsePositives is a list of words that may wrongly trigger the DefaultProfanities

View Source

var DefaultProfanities = []string{
	"anal",
	"anus",
	"arse",
	"ass",
	"ballsack",
	"balls",
	"bastard",
	"bitch",
	"btch",
	"biatch",
	"blowjob",
	"bollock",
	"bollok",
	"boner",
	"boob",
	"bugger",
	"butt",
	"choad",
	"clitoris",
	"cock",
	"coon",
	"crap",
	"cum",
	"cunt",
	"dick",
	"dildo",
	"douchebag",
	"dyke",
	"fag",
	"feck",
	"fellate",
	"fellatio",
	"felching",
	"fuck",
	"fudgepacker",
	"flange",
	"gtfo",
	"hoe",
	"horny",
	"incest",
	"jerk",
	"jizz",
	"labia",
	"masturbat",
	"muff",
	"naked",
	"nazi",
	"nigga",
	"niggu",
	"nipple",
	"nips",
	"nude",
	"pedophile",
	"penis",
	"piss",
	"poop",
	"porn",
	"prick",
	"prostitut",
	"pube",
	"pussie",
	"pussy",
	"queer",
	"rape",
	"rapist",
	"retard",
	"rimjob",
	"scrotum",
	"sex",
	"shit",
	"slut",
	"spunk",
	"stfu",
	"suckmy",
	"tits",
	"tittie",
	"titty",
	"turd",
	"twat",
	"vagina",
	"wank",
	"whore",
}

DefaultProfanities is a list of profanities that are checked after the DefaultFalsePositives are removed

Note that some words that would normally be in this list may be in DefaultFalseNegatives

View Source

var ItalianProfanities = []string{
	"bastardo",
	"bastardi",
	"bastarda",
	"bastarde",
	"bernarda",
	"bischero",
	"bischera",
	"bocchino",
	"bordello",
	"cacare",
	"cagare",
	"cagata",
	"cagate",
	"caghetta",
	"cagone",
	"cazzata",
	"cazzone",
	"cazzo",
	"cesso",
	"ciucciata",
	"cogliona",
	"coglione",
	"cornuto",
	"cristo",
	"cretina",
	"cretino",
	"culattone",
	"culattona",
	"culo",
	"deficiente",
	"figa",
	"fottuta",
	"fottuto",
	"frocio",
	"frocetto",
	"gesu",
	"imbecil",
	"incazzare",
	"incazzato",
	"incazzati",
	"maronna",
	"merda",
	"merdina",
	"merdona",
	"merdaccia",
	"mignotta",
	"mignottona",
	"mignottone",
	"mortacci",
	"negro",
	"negra",
	"pippa",
	"pippona",
	"pippone",
	"pippaccia",
	"pirla",
	"pompino",
	"porco",
	"puttana",
	"puttanon",
	"puttaniere",
	"puttanate",
	"rompiballe",
	"rompipalle",
	"rompicoglioni",
	"scazzi",
	"scemo",
	"scopare",
	"scopata",
	"stronzata",
	"stronzo",
	"troia",
	"troione",
	"trombata",
	"vaffanculo",
	"zoccola",
}

Functions ¶

func Censor ¶

func Censor(s string) string

Censor takes in a string (word or sentence) and tries to censor all profanities found.

Uses the default ProfanityDetector

func ExtractProfanity ¶

func ExtractProfanity(s string) string

ExtractProfanity takes in a string (word or sentence) and look for profanities. Returns the first profanity found, or an empty string if none are found

Uses the default ProfanityDetector

func IsProfane ¶

func IsProfane(s string) bool

IsProfane checks whether there are any profanities in a given string (word or sentence).

Uses the default ProfanityDetector

Types ¶

type ProfanityDetector ¶

type ProfanityDetector struct {
	// contains filtered or unexported fields
}

ProfanityDetector contains the dictionaries as well as the configuration for determining how profanity detection is handled

func NewProfanityDetector ¶

func NewProfanityDetector() *ProfanityDetector

NewProfanityDetector creates a new ProfanityDetector

func (*ProfanityDetector) Censor ¶

func (g *ProfanityDetector) Censor(s string) string

Censor takes in a string (word or sentence) and tries to censor all profanities found.

func (*ProfanityDetector) ExtractProfanities ¶

func (g *ProfanityDetector) ExtractProfanities(s string) ([]string, []int)

func (*ProfanityDetector) ExtractProfanity ¶

func (g *ProfanityDetector) ExtractProfanity(s string) string

ExtractProfanity takes in a string (word or sentence) and look for profanities. Returns the first profanity found, or an empty string if none are found

func (*ProfanityDetector) IsProfane ¶

func (g *ProfanityDetector) IsProfane(s string) bool

IsProfane takes in a string (word or sentence) and look for profanities. Returns a boolean

func (*ProfanityDetector) WithCustomCharacterReplacements ¶

func (g *ProfanityDetector) WithCustomCharacterReplacements(characterReplacements map[rune]rune) *ProfanityDetector

WithCustomCharacterReplacements allows configuring characters that to be replaced by other characters.

Note that all entries that have the value ' ' are considered as special characters while all entries with a value that is not ' ' are considered as leet speak.

Defaults to DefaultCharacterReplacements

func (*ProfanityDetector) WithCustomDictionary ¶

func (g *ProfanityDetector) WithCustomDictionary(profanities, falsePositives, falseNegatives []string) *ProfanityDetector

WithCustomDictionary allows configuring whether the sanitization process should also take into account custom profanities, false positives and false negatives dictionaries.

func (*ProfanityDetector) WithSanitizeAccents ¶

func (g *ProfanityDetector) WithSanitizeAccents(sanitize bool) *ProfanityDetector

WithSanitizeAccents allows configuring of whether the sanitization process should also take into account accents. By default, this is set to true, but since this adds a bit of overhead, you may disable it if your use case is time-sensitive or if the input doesn't involve accents (i.e. if the input can never contain special characters)

func (*ProfanityDetector) WithSanitizeLeetSpeak ¶

func (g *ProfanityDetector) WithSanitizeLeetSpeak(sanitize bool) *ProfanityDetector

WithSanitizeLeetSpeak allows configuring whether the sanitization process should also take into account leetspeak

Leetspeak characters are characters to be replaced by non-' ' values in the characterReplacements map. For instance, '4' is replaced by 'a' and '3' is replaced by 'e', which means that "4sshol3" would be sanitized to "asshole", which would be detected as a profanity.

By default, this is set to true.

func (*ProfanityDetector) WithSanitizeSpaces ¶

func (g *ProfanityDetector) WithSanitizeSpaces(sanitize bool) *ProfanityDetector

WithSanitizeSpaces allows configuring whether the sanitization process should also take into account spaces

func (*ProfanityDetector) WithSanitizeSpecialCharacters ¶

func (g *ProfanityDetector) WithSanitizeSpecialCharacters(sanitize bool) *ProfanityDetector

WithSanitizeSpecialCharacters allows configuring whether the sanitization process should also take into account special characters.

Special characters are characters that are part of the characterReplacements map (DefaultCharacterReplacements by default) and are to be removed during the sanitization step.

For instance, "fu_ck" would be sanitized to "fuck", which would be detected as a profanity.

By default, this is set to true.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL