xurls

package module
v2.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 2, 2025 License: BSD-3-Clause Imports: 4 Imported by: 241

README

xurls

Go Reference

Extract urls from text using regular expressions. Requires Go 1.22 or later.

import "mvdan.cc/xurls/v2"

func main() {
	rxRelaxed := xurls.Relaxed()
	rxRelaxed.FindString("Do gophers live in golang.org?")  // "golang.org"
	rxRelaxed.FindString("This string does not have a URL") // ""

	rxStrict := xurls.Strict()
	rxStrict.FindAllString("must have scheme: http://foo.com/.", -1) // []string{"http://foo.com/"}
	rxStrict.FindAllString("no scheme, no match: foo.com", -1)       // []string{}
}

Since API is centered around regexp.Regexp, many other methods are available, such as finding the byte indexes for all matches.

The regular expressions are compiled when the API is first called. Any subsequent calls will use the same regular expression pointers.

cmd/xurls

To install the tool globally:

go install mvdan.cc/xurls/v2/cmd/xurls@latest
$ echo "Do gophers live in http://golang.org?" | xurls
http://golang.org

Documentation

Overview

Package xurls extracts urls from plain text using regular expressions.

Example
rx := xurls.Relaxed()
fmt.Println(rx.FindString("Do gophers live in http://golang.org?"))
fmt.Println(rx.FindAllString("foo.com is http://foo.com/.", -1))
Output:

http://golang.org
[foo.com http://foo.com/]
Example (FilterEmails)
s := "Email dev@foo.com about any issues with foo.com or https://foo.com/dl"
rx := xurls.Relaxed()
idxEmail := rx.SubexpIndex("relaxedEmail")
for _, match := range rx.FindAllStringSubmatch(s, -1) {
	if match[idxEmail] != "" {
		continue // skip lone email addresses
	}
	fmt.Println(match[0])
}
Output:

foo.com
https://foo.com/dl

Index

Examples

Constants

This section is empty.

Variables

View Source
var AnyScheme = `(?:[a-zA-Z][a-zA-Z.\-+]*://|` + anyOf(SchemesNoAuthority...) + `:)`

AnyScheme can be passed to StrictMatchingScheme to match any possibly valid scheme, and not just the known ones.

View Source
var PseudoTLDs = []string{
	`bit`,
	`example`,
	`exit`,
	`gnu`,
	`i2p`,
	`invalid`,
	`local`,
	`localhost`,
	`test`,
	`zkey`,
}

PseudoTLDs is a sorted list of some widely used unofficial TLDs.

Sources:

View Source
var Schemes = []string{}/* 386 elements not displayed */

Schemes is a sorted list of all IANA assigned schemes.

Source: https://www.iana.org/assignments/uri-schemes/uri-schemes-1.csv

View Source
var SchemesNoAuthority = []string{
	`bitcoin`,
	`cid`,
	`file`,
	`geo`,
	`magnet`,
	`mailto`,
	`matrix`,
	`mid`,
	`sms`,
	`tel`,
	`xmpp`,
}

SchemesNoAuthority is a sorted list of some well-known url schemes that are followed by ":" instead of "://". The list includes both officially registered and unofficial schemes.

View Source
var SchemesUnofficial = []string{
	`gemini`,
	`jdbc`,
	`moz-extension`,
	`postgres`,
	`postgresql`,
	`slack`,
	`zoommtg`,
	`zoomus`,
}

SchemesUnofficial is a sorted list of some well-known url schemes which aren't officially registered just yet. They tend to correspond to software.

Mostly collected from https://en.wikipedia.org/wiki/List_of_URI_schemes#Unofficial_but_common_URI_schemes.

View Source
var TLDs = []string{}/* 1456 elements not displayed */

TLDs is a sorted list of all public top-level domains.

Sources:

Functions

func Relaxed

func Relaxed() *regexp.Regexp

Relaxed produces a regexp that matches any URL matched by Strict, plus any URL or email address with no scheme.

Email addresses without a scheme match the `relaxedEmail` subexpression, which can be used to filter them as needed.

func Strict

func Strict() *regexp.Regexp

Strict produces a regexp that matches any URL with a scheme in either the Schemes or SchemesNoAuthority lists.

func StrictMatchingScheme

func StrictMatchingScheme(exp string) (*regexp.Regexp, error)

StrictMatchingScheme produces a regexp similar to Strict, but requiring that the scheme match the given regular expression. See AnyScheme too.

Example
rx, err := xurls.StrictMatchingScheme(`https?://`)
if err != nil {
	panic(err)
}
fmt.Println(rx.FindAllString("Download binaries via https://foo.com/dl or ftps://foo.com/dl", -1))
Output:

[https://foo.com/dl]

Types

This section is empty.

Directories

Path Synopsis
cmd
generate

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL