urlshaper

package module
v0.0.0-...-6ba2583 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 15, 2024 License: Apache-2.0 Imports: 5 Imported by: 17

README

urlshaper

OSS Lifecycle

URL Shaper is a go library that takes a URL path and query and breaks it up into its components and creates a shape.

URL Shaper will generate a URL Shape by taking a URL and making a number of changes to it to make it easier to do analysis on URL patterns:

  • replaces query parameter values with a question mark
  • query parameters in the URL shape are alphabetized
  • based on patterns, variables in the URL path are replaced by the variable name

In addition to the URL shape, the Result object you get back from URL Shaper contains

  • the path, with any query parameters removed
  • the query, with the path removed
  • a url.Values object containing all the query parameters
  • a url.Values object containing all the path parameters

Parameters in the path portion of the URL are identified by matching the URL against a list of provided patterns. Patterns are matched in the order provided; the first match wins. Patterns should represent the entire path portion of the URL - include a "*" at the end to match arbitrary additional segments.

Examples

Query parameters:

input:

  • path: /about/en/books?isbn=123456&author=Alice

output:

  • uri: /about/en/books?isbn=123456&author=Alice
  • path: /about/en/books
  • query: isbn=123456&author=Alice
  • query_fields: {"isbn":["123456"],"author":["Alice"]}
  • path_fields: {}
  • shape: /about/en/books?author=?&isbn=?
REST:

input:

  • path: /about/en/books/123456
  • pattern: /about/:lang/books/:isbn

output:

  • uri: /about/en/books/123456
  • path: /about/en/books/123456
  • query: ""
  • query_fields: {}
  • path_fields: {"lang":["en"],"isbn":["123456"]}
  • shape: /about/:lang/books/:isbn
REST & Query parameters:

input:

  • path: /about/en/books?isbn=123456&author=Alice&isbn=987654
  • pattern: /about/:lang/books

output:

  • uri: /about/en/books?isbn=123456&author=Alice
  • path: /about/en/books
  • query: isbn=123456&author=Alice
  • query_fields: {"isbn":["123456", "987654"],"author":["Alice"]}
  • path_fields: {"lang":["en"]}
  • shape: /about/:lang/books?author=?&isbn=?
Unmatched:

input:

  • path /other/path
  • patterns: /about/:lang/books, /docs/:section

output:

  • uri: /other/path
  • path: /other/path
  • query: ""
  • query_fields: {}
  • path_fields: {}
  • shape: /other/path
Wildcard:

input:

-path /docs/quickstart/linux -pattern /docs/:section/*

output:

  • uri: /docs/quickstart/linux
  • path: /docs/quickstart/linux
  • query: ""
  • query_fields: {}
  • path_fields: {"quickstart":["linux"]}
  • shape: /docs/:section/*

Documentation

Overview

Package urlshaper creates a normalized shape (and other fields) from a URL.

Summary

Given a URL or a URL Path (eg http://example.com/foo/bar?q=baz or just /foo/bar?q=baz) and an optional list of URL patterns, urlshaper will return an object that has that URL broken up in to its various components and provide a normalized shape of the URL.

Inputs

URL inputs to the urlshaper should be strings. They can be either fully qualified URLs or just the path. (Anything that the net/url parser can parse should be fine.)

Valid URL inputs:

http://example.com/foo/bar
https://example.com:8080/foo?q=bar
/foo/bar/baz
/foo?bar=baz

Patterns should describe only the path section of the URL. Variable portions of the URL should be identified by a preceeding the section name with a colon (":"). To match additional sections after the pattern, include a terminal asterisk ("*")

Valid patterns:

/about               matches /about and /about?q=bar
/about/:lang         matches /about/en and /about/1234?q=bar
/about/:lang/page    matches /about/en/page and /about/1234/page?q=bar
/about/*             matches /about/foo/bar/baz and /about/a/b/c?q=bar

Output

If there is no error, the returned Result objected always has URI, Path, and Shape filled in. The remaining fields will have zero values if the corresponding sections of the URL are missing.

Example
prs := Parser{}

// Add three sample patterns to our parser.
// Patterns are always matched in list order; first match wins
for _, pat := range []string{
	"/about",
	"/about/:lang",
	"/about/:lang/page",
} {
	prs.Patterns = append(prs.Patterns, &Pattern{Pat: pat})
}

// Parse and generate the shape for a complex URL
urlStr := "http://example.com:8080/about/english?title=Paradise&state=California"
result, _ := prs.Parse(urlStr)

fmt.Printf(`Original URL: %s
URI: %s
Path: %s
Query: %s
QueryFields: %+v
PathFields: %+v
Shape: %s
PathShape: %s
QueryShape: %s
`, urlStr, result.URI, result.Path, result.Query, result.QueryFields,
	result.PathFields, result.Shape, result.PathShape, result.QueryShape)
Output:

Original URL: http://example.com:8080/about/english?title=Paradise&state=California
URI: http://example.com:8080/about/english?title=Paradise&state=California
Path: /about/english
Query: title=Paradise&state=California
QueryFields: map[title:[Paradise] state:[California]]
PathFields: map[lang:[english]]
Shape: /about/:lang?state=?&title=?
PathShape: /about/:lang
QueryShape: state=?&title=?

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Parser

type Parser struct {
	Patterns []*Pattern
}

Parser contains a list of Patterns to use for parsing URLs, then exposes the functionality to generate a Result.

func (*Parser) Parse

func (p *Parser) Parse(rawURL string) (*Result, error)

Parse takes a URL string and a list of patterns, attempts to parse, and and hands back a Result and error

type Pattern

type Pattern struct {
	Pat string
	// contains filtered or unexported fields
}

Pattern is an object that represents a URL path pattern you wish to use when creating the shape. After adding a pattern to the Pattern object, you should call Compile on the pattern so it is suitable to use for matching.

If you don't call Compile, the pattern will be automatically compiled, but you won't see any errors that come up during compilation. If the compile fails, the pattern will be silently ignored.

Patterns should not include any query parameters - only the path portion of the URL is tested against the pattern.

func (*Pattern) Compile

func (p *Pattern) Compile() error

Compile turns the Pattern into a compiled regex that will be used to match URL patterns

type Result

type Result struct {
	// URI is the original string that is parsed
	URI string
	// Path is the unmodified path portion of the URL
	Path string
	// Query is the unmodified query portion of the URL
	Query string
	// QueryFields is a map of the query parameters to values
	QueryFields url.Values
	// PathFields is a map of the keys in the provided pattern to the values
	// extracted from the Path
	PathFields url.Values
	// Shape is the normalized URL, with all variable portions replaced by '?'
	// and query parameters sorted alphabetically
	Shape string
	// PathShape is the path portion of the normalized URL
	PathShape string
	// QueryShape is the query portion of the normalized URL
	QueryShape string
}

Result contains the parsed portions of the input URL

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL