urlutil

package
v0.0.0-...-742bdff Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 15, 2024 License: MIT Imports: 11 Imported by: 0

README

urlutil

The package contains various helpers to interact with URLs

URL Parsing Methods

Function Description Type Behavior
Parse(inputURL string) Standard URL Parsing (+ Some Edgecases) Both Relative & Absolute URLs NA
ParseURL(inputURL string, unsafe bool) Standard + Unsafe URL Parsing (+ Edgecases) Both Relative & Absolute URLs NA
ParseRelativeURL(inputURL string, unsafe bool) Standard + Unsafe URL Parsing (+ Edgecases) Only Relative URLs error if absolute URL is given
ParseRawRelativeURL(inputURL string, unsafe bool) Standard + Unsafe URL Parsing Only Relative URLs error if absolute URL is given
ParseAbsoluteURL(inputURL string, unsafe bool) Standard + Unsafe URL Parsing (+ Edgecases) Only Absolute URLs error if relative URL is given
Known Edgecases / Changes from url.URL
  • Query Parameters are Ordered
  • Invalid unicode characters and invalid url encodings allowed in unsafe mode
  • u.Path is always / prefixed if not empty (Except ParseRawRelativePath)
  • allows invalid values / encodings in url path
  • Does not encode characters except reserved characters in query parameters (see: Raw Params)
  • almost proper parsing of url into parts (scheme,host,path,query,fragment) [known limitation of manually added hostnames like mydomain (without . in hostname)]

More details on each edgecase/behavior is given below

difference b/w net/url.URL and utils/url/URL

  • url.URL caters to variety of urls and for that reason its parsing is not that accurate under various conditions

  • utils/url/URL is a wrapper around url.URL that handles below edgecases and is able to parse complex (i.e non-RFC compilant urls but required in infosec) url edgecases.

  • url.URL allows u.Path without / prefix but it is not allowed in utils/url/URL and is autocorrected if / prefix is missing

  • Parsing URLs without scheme

// if below urls are parsed with url.Parse(). url parts(scheme,host,path etc) are not properly classified
scanme.sh
scanme.sh:443/port
scame.sh/with/path
  • Encoding of parameters(url.Values)

    • url.URL encodes all reserved characters(as per RFC(s)) in parameter key-value pair (i.e url.Values{})
    • If reserved/special characters are url encoded then integrity of specially crafted payloads (lfi,xss,sqli) is lost.
    • utils/url/URL uses utils/url/Params to store/handle parameters and integrity of all such payload is preserved
    • utils/url/URL also provides options to customize url encoding using global variable and function params
  • Parsing Unsafe/Invalid Paths

    • while parsing urls url.Parse() either discards or re-encodes some of the specially crafted payloads
    • If a non valid url encoding is given in url (ex: scanme.sh/%invalid) url.Parse() returns error and url is not parsed
    • Such cases are implicitly handled if unsafe is true
// Example urls for above condition
scanme.sh/?some'param=`'+OR+ORDER+BY+1--
scanme.sh/?some[param]=<script>alert(1)</script>
scanme.sh/%invalid/path
  • utils/url/URL has some extra methods

    • .TrimPort()
    • .MergePath(newrelpath string, unsafe bool)
    • .UpdateRelPath(newrelpath string, unsafe bool)
    • .Clone() and more
  • Dealing with Double URL Encoding of chars like %0A when .Path is directly updated

    when url.Parse is used to parse url like https://127.0.0.1/%0A it internally calls u.setPath which decodes %0A to \n and saves it in u.Path and when final url is created at time of writing to connection in http.Request Path is then escaped again thus \n becomes %0A and final url becomes https://127.0.0.1/%0A which is expected/required behavior.

    If u.Path is changed/updated directly after url.Parse ex: u.Path = "%0A" then at time of writing to connection in http.Request, Path is escaped again thus %0A becomes %250A and final url becomes https://127.0.0.1/%250A which is not expected/required behavior to avoid this we manually unescape/decode u.Path and we set u.Path = unescape(u.Path) which takes care of this edgecase.

    This is how utils/url/URL handles this edgecase when u.Path is directly updated.

Note

utils/url/URL embeds url.URL and thus inherits and exposes all url.URL methods and variables. Its ok to use any method from url.URL (directly/indirectly) except url.URL.Query() and url.URL.String() (due to parameter encoding issues). In any case if it is not possible to follow above point (ex: directly updating/referencing http.Request.URL) .Update() method should be called before accessing them which updates url.URL instance for this edgecase. (Not required if above rule is followed)

Documentation

Index

Constants

View Source
const (
	HTTP  = "http"
	HTTPS = "https"

	// Deny all protocols
	// Allow:
	// websocket + websocket over ssl
	WEBSOCKET     = "ws"
	WEBSOCKET_SSL = "wss"
	FTP           = "ftp"

	SchemeSeparator  = "://"
	DefaultHTTPPort  = "80"
	DefaultHTTPSPort = "443"
)

Variables

View Source
var AllowLegacySeperator bool = false

Legacy Seperator (i.e `;`) is used as seperator for parameters this was removed in go >=1.17

View Source
var MustEscapeCharSet []rune = []rune{'?', '#', '@', ';', '&', ',', '[', ']', '^'}

MustEscapeCharSet are special chars that are always escaped and are based on reserved chars from RFC Some of Reserved Chars From RFC were excluded and some were added for various reasons and goal here is to encode parameters key and value only

View Source
var RFCEscapeCharSet []rune = []rune{'!', '*', '\'', '(', ')', ';', ':', '@', '&', '=', '+', '$', ',', '/', '?', '%', '#', '[', ']'}

Reserved Chars from RFC ! * ' ( ) ; : @ & = + $ , / ? % # [ ]

Functions

func AutoMergeRelPaths

func AutoMergeRelPaths(path1 string, path2 string) (string, error)

AutoMergeRelPaths merges two relative paths including parameters and returns final string

func ParamEncode

func ParamEncode(data string) string

ParamEncode encodes Key characters only. key characters include whitespaces + non printable chars + non-ascii also this does not double encode encoded characters

func PercentEncoding

func PercentEncoding(data string) string

PercentEncoding encodes all characters to percent encoded format just like burpsuite decoder

func URLEncodeWithEscapes

func URLEncodeWithEscapes(data string, charset ...rune) string

URLEncodeWithEscapes URL encodes data with given special characters escaped (similar to burpsuite intruder) Note `MustEscapeCharSet` is not included

Types

type OrderedParams

type OrderedParams struct {

	// IncludeEquals is used to include = in encoded parameters, default is false
	IncludeEquals bool
	// contains filtered or unexported fields
}

OrderedParams is a map that preserves the order of elements

func NewOrderedParams

func NewOrderedParams() *OrderedParams

NewOrderedParams creates a new ordered params

func (*OrderedParams) Add

func (o *OrderedParams) Add(key string, value ...string)

Add Parameters to store

func (*OrderedParams) Clone

func (o *OrderedParams) Clone() *OrderedParams

Clone returns a copy of the ordered params

func (*OrderedParams) Decode

func (o *OrderedParams) Decode(raw string)

Decode is opposite of Encode() where ("bar=baz&foo=quux") is parsed Parameters are loosely parsed to allow any scenario

func (*OrderedParams) Del

func (o *OrderedParams) Del(key string)

Del deletes values associated with key

func (*OrderedParams) Encode

func (o *OrderedParams) Encode() string

Encode returns encoded parameters by preserving order

func (*OrderedParams) Get

func (o *OrderedParams) Get(key string) string

Get returns first value of given key

func (*OrderedParams) GetAll

func (o *OrderedParams) GetAll(key string) []string

GetAll returns all values of given key or returns empty slice if key doesn't exist

func (*OrderedParams) Has

func (o *OrderedParams) Has(key string) bool

Has returns if given key exists

func (*OrderedParams) IsEmpty

func (o *OrderedParams) IsEmpty() bool

IsEmpty checks if the OrderedParams is empty

func (*OrderedParams) Iterate

func (o *OrderedParams) Iterate(f func(key string, value []string) bool)

Iterate iterates over the OrderedParams

func (*OrderedParams) Merge

func (o *OrderedParams) Merge(raw string)

Merges given paramset into existing one with base as priority

func (*OrderedParams) Set

func (o *OrderedParams) Set(key string, value string)

Set sets the key to value and replaces if already exists

func (*OrderedParams) Update

func (o *OrderedParams) Update(key string, value []string)

Update is similar to Set but it takes value as slice (similar to internal implementation of url.Values)

type Params

type Params map[string][]string

func GetParams

func GetParams(query url.Values) Params

GetParams return Params type using url.Values

func NewParams

func NewParams() Params

func (Params) Add

func (p Params) Add(key string, value ...string)

Add Parameters to store

func (Params) Decode

func (p Params) Decode(raw string)

Decode is opposite of Encode() where ("bar=baz&foo=quux") is parsed Parameters are loosely parsed to allow any scenario

func (Params) Del

func (p Params) Del(key string)

Del deletes values associated with key

func (Params) Encode

func (p Params) Encode() string

Encode URL encodes and returns values ("bar=baz&foo=quux") sorted by key.

func (Params) Get

func (p Params) Get(key string) string

Get returns first value of given key

func (Params) Has

func (p Params) Has(key string) bool

Has returns if given key exists

func (Params) Merge

func (p Params) Merge(x Params)

Merges given paramset into existing one with base as priority

func (Params) Set

func (p Params) Set(key string, value string)

Set sets the key to value and replaces if already exists

type URL

type URL struct {
	*url.URL

	Original   string         // original or given url(without params if any)
	Unsafe     bool           // If request is unsafe (skip validation)
	IsRelative bool           // If URL is relative
	Params     *OrderedParams // Query Parameters
	// contains filtered or unexported fields
}

URL a wrapper around net/url.URL

func Parse

func Parse(inputURL string) (*URL, error)

ParseURL (can be relative or absolute)

func ParseAbsoluteURL

func ParseAbsoluteURL(inputURL string, unsafe bool) (*URL, error)

ParseAbsoluteURL parses and returns absolute url should be preferred over others when input is known to be absolute url this reduces any normalization and autocorrection related to relative paths and returns error if input is relative path

func ParseRawRelativePath

func ParseRawRelativePath(inputURL string, unsafe bool) (*URL, error)

ParseRelativePath

func ParseRelativePath

func ParseRelativePath(inputURL string, unsafe bool) (*URL, error)

ParseRelativePath parses and returns relative path should be preferred over others when input is known to be relative path this reduces any normalization and autocorrection related to absolute paths and returns error if input is absolute path

func ParseURL

func ParseURL(inputURL string, unsafe bool) (*URL, error)

Parse and return URL (can be relative or absolute)

func (*URL) Clone

func (u *URL) Clone() *URL

Clone

func (*URL) EscapedString

func (u *URL) EscapedString() string

EscapedString returns a string that can be used as filename (i.e stripped of / and params etc)

func (*URL) GetRelativePath

func (u *URL) GetRelativePath() string

GetRelativePath ex: /some/path?param=true#fragment

func (*URL) MergePath

func (u *URL) MergePath(newrelpath string, unsafe bool) error

mergepath merges given relative path

func (*URL) Query

func (u *URL) Query() *OrderedParams

Query returns Query Params

func (*URL) String

func (u *URL) String() string

String

func (*URL) TrimPort

func (u *URL) TrimPort()

TrimPort if any

func (*URL) Update

func (u *URL) Update()

Updates internal wrapped url.URL with any changes done to Query Parameters

func (*URL) UpdatePort

func (u *URL) UpdatePort(newport string)

Updates port

func (*URL) UpdateRelPath

func (u *URL) UpdateRelPath(newrelpath string, unsafe bool) error

UpdateRelPath updates relative path with new path (existing params are not removed)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL