Documentation ¶
Overview ¶
Package xurls extracts urls from plain text using regular expressions.
Example ¶
rx := xurls.Relaxed() fmt.Println(rx.FindString("Do gophers live in http://golang.org?")) fmt.Println(rx.FindAllString("foo.com is http://foo.com/.", -1))
Output: http://golang.org [foo.com http://foo.com/]
Example (FilterEmails) ¶
s := "Email dev@foo.com about any issues with foo.com or https://foo.com/dl" rx := xurls.Relaxed() idxEmail := rx.SubexpIndex("relaxedEmail") for _, match := range rx.FindAllStringSubmatch(s, -1) { if match[idxEmail] != "" { continue // skip lone email addresses } fmt.Println(match[0]) }
Output: foo.com https://foo.com/dl
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var AnyScheme = `(?:[a-zA-Z][a-zA-Z.\-+]*://|` + anyOf(SchemesNoAuthority...) + `:)`
AnyScheme can be passed to StrictMatchingScheme to match any possibly valid scheme, and not just the known ones.
var PseudoTLDs = []string{
`bit`,
`example`,
`exit`,
`gnu`,
`i2p`,
`invalid`,
`local`,
`localhost`,
`test`,
`zkey`,
}
PseudoTLDs is a sorted list of some widely used unofficial TLDs.
Sources:
- https://en.wikipedia.org/wiki/Pseudo-top-level_domain
- https://en.wikipedia.org/wiki/Category:Pseudo-top-level_domains
- https://tools.ietf.org/html/draft-grothoff-iesg-special-use-p2p-names-00
- https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml
var Schemes = []string{}/* 386 elements not displayed */
Schemes is a sorted list of all IANA assigned schemes.
Source: https://www.iana.org/assignments/uri-schemes/uri-schemes-1.csv
var SchemesNoAuthority = []string{
`bitcoin`,
`cid`,
`file`,
`geo`,
`magnet`,
`mailto`,
`matrix`,
`mid`,
`sms`,
`tel`,
`xmpp`,
}
SchemesNoAuthority is a sorted list of some well-known url schemes that are followed by ":" instead of "://". The list includes both officially registered and unofficial schemes.
var SchemesUnofficial = []string{
`gemini`,
`jdbc`,
`moz-extension`,
`postgres`,
`postgresql`,
`slack`,
`zoommtg`,
`zoomus`,
}
SchemesUnofficial is a sorted list of some well-known url schemes which aren't officially registered just yet. They tend to correspond to software.
Mostly collected from https://en.wikipedia.org/wiki/List_of_URI_schemes#Unofficial_but_common_URI_schemes.
var TLDs = []string{}/* 1456 elements not displayed */
TLDs is a sorted list of all public top-level domains.
Sources:
Functions ¶
func Relaxed ¶
Relaxed produces a regexp that matches any URL matched by Strict, plus any URL or email address with no scheme.
Email addresses without a scheme match the `relaxedEmail` subexpression, which can be used to filter them as needed.
func Strict ¶
Strict produces a regexp that matches any URL with a scheme in either the Schemes or SchemesNoAuthority lists.
func StrictMatchingScheme ¶
StrictMatchingScheme produces a regexp similar to Strict, but requiring that the scheme match the given regular expression. See AnyScheme too.
Example ¶
rx, err := xurls.StrictMatchingScheme(`https?://`) if err != nil { panic(err) } fmt.Println(rx.FindAllString("Download binaries via https://foo.com/dl or ftps://foo.com/dl", -1))
Output: [https://foo.com/dl]
Types ¶
This section is empty.