uri

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 23, 2023 License: MIT Imports: 7 Imported by: 23

README

uri

Lint CI Coverage Status Vulnerability Check Go Report Card

GitHub tag (latest by date) Go Reference license

Package uri is meant to be an RFC 3986 compliant URI builder, parser and validator for golang.

It supports strict RFC validation for URI and URI relative references.

This allows for stricter conformance than the net/url package in the Go standard libary, which provides a workable but loose implementation of the RFC for URLs.

What's new?

v1.1.0

Build

  • requires go1.19

Features

  • Typed errors: parsing and validation now returns errors of type uri.Error, with a more accurate pinpointing of the error provided by the value. Errors support the go1.20 addition to standard errors with Join() and Cause(). For go1.19, backward compatibility is ensured (errors.Join() is emulated).
  • DNS schemes can be overridden at the package level

Performances

  • Significantly improved parsing speed by dropping usage of regular expressions and reducing allocations (~ x20 faster).

Fixes

  • stricter compliance regarding paths beginning with a double '/'
  • stricter compliance regarding the length of DNS names and their segments
  • stricter compliance regarding IPv6 addresses with an empty zone
  • stricter compliance regarding IPv6 vs IPv4 litterals
  • an empty IPv6 litteral [] is invalid

Known open issues

  • IRI validation lacks strictness
  • IPv6 validation relies on the standard library and lacks strictness

Other

Major refactoring to enhance code readability, esp. for testing code.

  • Refactored validations
  • Refactored test suite
  • Added support for fuzzing, dependabots & codeQL scans

Usage

Parsing
	u, err := Parse("https://example.com:8080/path")
	if err != nil {
		fmt.Printf("Invalid URI")
	} else {
		fmt.Printf("%s", u.Scheme())
	}
	// Output: https
	u, err := ParseReference("//example.com/path")
	if err != nil {
		fmt.Printf("Invalid URI reference")
	} else {
		fmt.Printf("%s", u.Authority().Path())
	}
	// Output: /path
Validating
    isValid := IsURI("urn://example.com?query=x#fragment/path") // true

    isValid= IsURI("//example.com?query=x#fragment/path") // false

    isValid= IsURIReference("//example.com?query=x#fragment/path") // true
Caveats
  • Registered name vs DNS name: RFC3986 defines a super-permissive "registered name" for hosts, for URIs not specifically related to an Internet name. Our validation performs a stricter host validation according to DNS rules whenever the scheme is a well-known IANA-registered scheme (the function UsesDNSHostValidation(string) bool is customizable).

Examples: ftp://host, http://host default to validating a proper DNS hostname.

  • IPv6 validation relies on IP parsing from the standard library. It is not super strict regarding the full-fledged IPv6 specification.

  • URI vs URL: every URL should be a URI, but the converse does not always hold. This module intends to perform stricter validation than the pragmatic standard library net/url, which currently remains about 30% faster.

  • URI vs IRI: at this moment, this module checks for URI, while supporting unicode letters as ALPHA tokens. This is not strictly compliant with the IRI specification (see known issues).

Building

The exposed type URI can be transformed into a fluent Builder to set the parts of an URI.

	aURI, _ := Parse("mailto://user@domain.com")
	newURI := auri.Builder().SetUserInfo(test.name).SetHost("newdomain.com").SetScheme("http").SetPort("443")
Canonicalization

Not supported for now (contemplated as a topic for V2).

For URL normalization, see PuerkitoBio/purell.

Reference specifications

The librarian's corner (still WIP).

Title Reference Notes
Uniform Resource Identifier (URI) RFC3986 Deviations (1)
Uniform Resource Locator (URL) RFC1738
Relative URL RFC1808
Internationalized Resource Identifier (IRI) RFC3987 (1)
IPv6 addressing scheme reference and erratum (2)
Representing IPv6 Zone Identifiers RFC6874
https://tools.ietf.org/html/rfc6874
https://www.rfc-editor.org/rfc/rfc3513

(1) Deviations from the RFC:

  • Tokens: ALPHAs are tolerated to be Unicode Letter codepoints, DIGITs are tolerated to be Unicode Digit codepoints. Some improvements are needed to abide more strictly to IRIi's provisions for internationalization.

(2) IP addresses:

  • Now validation is stricter regarding [...] litterals (which must be IPv6) and ddd.ddd.ddd.ddd litterals (which must be IPv4).
  • RFC3886 requires the 6 parts of the IPv6 to be present. This module tolerates common syntax, such as [::]. Notice that [] is illegal, although the golang IP parser equates this to [::] (zero value IP).
  • IPv6 zones are supported, with the '%' escaped as '%25'

FAQ

Benchmarks

Credits

  • Tests have been aggregated from the test suites of URI validators from other languages: Perl, Python, Scala, .Net. and the Go url standard library.

  • This package was initially based on the work from ttacon/uri (credits: Trey Tacon).

Extra features like MySQL URIs present in the original repo have been removed.

  • A lot of improvements and suggestions have been brought by the incredible guys at fyne-io. Thanks all.

TODOs

  • [] Support IRI ucschar as unreserved characters
  • [] Support IRI iprivate in query
  • [] Prepare v2. See the proposal
  • [] Revisit URI vs IRI support & strictness, possibly with options (V2?)
  • [] Other investigations
Notes
ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD
iprivate       = %xE000-F8FF / %xF0000-FFFFD / %x100000-10FFFD

		// TODO: RFC6874
		//  A <zone_id> SHOULD contain only ASCII characters classified as
   		// "unreserved" for use in URIs [RFC3986].  This excludes characters
   		// such as "]" or even "%" that would complicate parsing.  However, the
   		// syntax described below does allow such characters to be percent-
   		// encoded, for compatibility with existing devices that use them.

Documentation

Overview

Package uri is meant to be an RFC 3986 compliant URI builder and parser.

This is based on the work from ttacon/uri (credits: Trey Tacon).

This fork concentrates on RFC 3986 strictness for URI parsing and validation.

Reference: https://tools.ietf.org/html/rfc3986

Tests have been augmented with test suites of URI validators in other languages: perl, python, scala, .Net.

Extra features like MySQL URIs present in the original repo have been removed.

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	ErrNoSchemeFound         = Error(newErr("no scheme found in URI"))
	ErrInvalidURI            = Error(newErr("not a valid URI"))
	ErrInvalidCharacter      = Error(newErr("invalid character in URI"))
	ErrInvalidScheme         = Error(newErr("invalid scheme in URI"))
	ErrInvalidQuery          = Error(newErr("invalid query string in URI"))
	ErrInvalidFragment       = Error(newErr("invalid fragment in URI"))
	ErrInvalidPath           = Error(newErr("invalid path in URI"))
	ErrInvalidHost           = Error(newErr("invalid host in URI"))
	ErrInvalidPort           = Error(newErr("invalid port in URI"))
	ErrInvalidUserInfo       = Error(newErr("invalid userinfo in URI"))
	ErrMissingHost           = Error(newErr("missing host in URI"))
	ErrInvalidHostAddress    = Error(newErr("invalid address for host"))
	ErrInvalidRegisteredName = Error(newErr("invalid host (registered name)"))
	ErrInvalidDNSName        = Error(newErr("invalid host (DNS name)"))
)

Validation errors.

View Source
var UsesDNSHostValidation = func(scheme string) bool {
	switch scheme {
	case "dns":
		return true
	case "dntp":
		return true
	case "finger":
		return true
	case "ftp":
		return true
	case "git":
		return true
	case "http":
		return true
	case "https":
		return true
	case "imap":
		return true
	case "irc":
		return true
	case "jms":
		return true
	case "mailto":
		return true
	case "nfs":
		return true
	case "nntp":
		return true
	case "ntp":
		return true
	case "postgres":
		return true
	case "redis":
		return true
	case "rmi":
		return true
	case "rtsp":
		return true
	case "rsync":
		return true
	case "sftp":
		return true
	case "skype":
		return true
	case "smtp":
		return true
	case "snmp":
		return true
	case "soap":
		return true
	case "ssh":
		return true
	case "steam":
		return true
	case "svn":
		return true
	case "tcp":
		return true
	case "telnet":
		return true
	case "udp":
		return true
	case "vnc":
		return true
	case "wais":
		return true
	case "ws":
		return true
	case "wss":
		return true
	}

	return false
}

UsesDNSHostValidation returns true if the provided scheme has host validation that does not follow RFC3986 (which is quite generic), and assumes a valid DNS hostname instead.

This function is declared as a global variable that may be overridden at the package level, in case you need specific schemes to validate the host as a DNS name.

See: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml

Functions

func IsURI

func IsURI(raw string) bool

IsURI tells if a URI is valid according to RFC3986/RFC397.

Example
package main

import (
	"fmt"

	"github.com/fredbi/uri"
)

func main() {
	isValid := uri.IsURI("urn://example.com?query=x#fragment/path") // true
	fmt.Println(isValid)

	isValid = uri.IsURI("//example.com?query=x#fragment/path") // false
	fmt.Println(isValid)

}
Output:

true
false

func IsURIReference

func IsURIReference(raw string) bool

IsURIReference tells if a URI reference is valid according to RFC3986/RFC397

Reference: https://www.rfc-editor.org/rfc/rfc3986#section-4.1 and https://www.rfc-editor.org/rfc/rfc3986#section-4.2

Example
package main

import (
	"fmt"

	"github.com/fredbi/uri"
)

func main() {
	isValid := uri.IsURIReference("//example.com?query=x#fragment/path") // true
	fmt.Println(isValid)

}
Output:

true

Types

type Authority

type Authority interface {
	UserInfo() string
	Host() string
	Port() string
	Path() string
	String() string
	Validate(...string) error
}

Authority information that a URI contains as specified by RFC3986.

Username and password are given by UserInfo().

type Builder

type Builder interface {
	URI() URI
	SetScheme(scheme string) Builder
	SetUserInfo(userinfo string) Builder
	SetHost(host string) Builder
	SetPort(port string) Builder
	SetPath(path string) Builder
	SetQuery(query string) Builder
	SetFragment(fragment string) Builder

	// Returns the URI this Builder represents.
	String() string
}

Builder builds URIs.

type Error added in v1.1.0

type Error interface {
	error
}

Error from the github.com/fredbi/uri module.

type URI

type URI interface {
	// Scheme the URI conforms to.
	Scheme() string

	// Authority information for the URI, including the "//" prefix.
	Authority() Authority

	// Query returns a map of key/value pairs of all parameters
	// in the query string of the URI.
	Query() url.Values

	// Fragment returns the fragment (component preceded by '#') in the
	// URI if there is one.
	Fragment() string

	// Builder returns a Builder that can be used to modify the URI.
	Builder() Builder

	// String representation of the URI
	String() string

	// Validate the different components of the URI
	Validate() error
}

URI represents a general RFC3986 URI.

func Parse

func Parse(raw string) (URI, error)

Parse attempts to parse a URI. It returns an error if the URI is not RFC3986-compliant.

Example
package main

import (
	"fmt"

	"github.com/fredbi/uri"
)

func main() {
	u, err := uri.Parse("https://example.com:8080/path")
	if err != nil {
		fmt.Println("Invalid URI:", err)
	} else {
		fmt.Println(u.String())
	}

}
Output:

https://example.com:8080/path

func ParseReference

func ParseReference(raw string) (URI, error)

ParseReference attempts to parse a URI relative reference.

It returns an error if the URI is not RFC3986-compliant.

Example
package main

import (
	"fmt"

	"github.com/fredbi/uri"
)

func main() {
	u, err := uri.ParseReference("//example.com/path?a=1#fragment")
	if err != nil {
		fmt.Println("Invalid URI reference:", err)
	} else {
		fmt.Println(u.Fragment())

		params := u.Query()
		fmt.Println(params.Get("a"))
	}
}
Output:

fragment
1

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL