Documentation ¶
Overview ¶
Package patterns provides regular expressions and pattern-matching utils.
Index ¶
- Variables
- func BylineComponents(byline string) []string
- func CleanByline(byline string) string
- func CleanFocus(focusSentence string) string
- func CleanName(name string) string
- func CleanOutquote(outquote string) string
- func CleanTitle(title string) string
- func DepartmentName(marker string) string
- func DriveID(url string) (id string, err error)
- func HrefCapture(html string) string
- func IsAE(str string) bool
- func IsByline(str string) bool
- func IsDepartmentMarker(str string) bool
- func IsFileUnwanted(filename string) bool
- func IsFocus(str string) bool
- func IsOutquote(str string) bool
- func IsSlugMember(str string) bool
- func NameVariables(name string) []string
Constants ¶
This section is empty.
Variables ¶
var AePattern = regexp.MustCompile(`Arts\s?&\s?Entertainment|A&?E`)
var UnwantedFilePattern = regexp.MustCompile(`(?i)worldbeat|survey|newsbeat|spookbeat|sportsbeat|playlist|calendar|\[IGNORE\]|corrections|timeline`)
Functions ¶
func BylineComponents ¶
BylineComponents extracts the components of a byline crucial to parsing (e.g. ampersands, words/names, commas). It returns a slice of the components.
func CleanByline ¶
CleanByline rids a byline of its paddings (e.g. "By:"). It returns the cleaned byline.
func CleanFocus ¶
CleanFocus rids a focus sentence of its paddings (e.g. "Focus Sentence:"). It returns the cleaned focus sentence.
func CleanName ¶
CleanName rids a name of nicknames and redundant spaces (e.g. "Ying Zi (Jessy) Mei"). It returns the cleaned name.
func CleanOutquote ¶
CleanOutquote rids an outquote of its paddings (e.g. "Outquote(s):"). It returns the cleaned outquote.
func CleanTitle ¶
CleanTitle rids a title of its paddings (e.g. "Title:"). It returns the cleaned title.
func DepartmentName ¶
DepartmentName extracts the department name of a slug line. It returns the department name.
func HrefCapture ¶
HrefCapture extracts the href of an a tag in HTML. It returns the href.
func IsAE ¶
IsAE determines whether a string represents the Arts & Entertainment department. It returns true or false.
func IsDepartmentMarker ¶
IsDepartmentMarker determines whether a string marks the department. (e.g. "The Spectator/Opinions/Issue 10")
func IsFileUnwanted ¶
IsFileUnwanted determines whether the name of a Drive file contains an unwanted article.
func IsOutquote ¶
IsOutquote determines whether a string is an outquote. It returns true or false.
func IsSlugMember ¶
IsSlugMember determines whether a string is a member of an article slug. It returns true or false.
func NameVariables ¶
NameVariables splits a name of variable length into a first name and a last name. It returns a slice with the first element as the first name and the second element as the last name.
Types ¶
This section is empty.