url

package module

v0.0.0-...-8f3fcd1 Latest Latest Go to latest Published: Nov 30, 2024 License: MIT Imports: 9 Imported by: 8

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/hueristiq/hq-go-url

Links

Open Source Insights

README ¶

hq-go-url

hq-go-url is a Go (Golang) package for extracting, parsing, and manipulating URLs with ease. This library is useful for developers who need to work with URLs in a structured way.

Resources

Features
Usage
- Extraction
  - Domains
    - Customizing Domain Extractor
  - URLs
    - Customizing URL Extractor
- Parsing
  - Domains
  - URLs
Contributing
Licensing
Credits
- Contributors
- Alternatives

Features

Flexible Domain Extraction: Extract domains from text using regular expressions.
Flexible URL Extraction: Extract URLs from text using regular expressions.
Domain Parsing: Parse domains into subdomains, second-level domains, and top-level domains.
Extended URL Parsing: Extend the standard net/url package in Go with additional fields and capabilities.

Installation

To install the package, run the following command in your terminal:

go get -v -u github.com/hueristiq/hq-go-url

This command will download and install the hq-go-url package into your Go workspace, making it available for use in your projects.

Usage

Below are examples demonstrating how to use the different features of the hq-go-url package.

Extraction

Domains

package main

import (
	"fmt"
	hqgourl "github.com/hueristiq/hq-go-url"
	"regexp"
)

func main() {
	extractor := hqgourl.NewDomainExtractor()
	text := "Check out this website: https://example.com and send an email to info@example.com."

	regex := extractor.CompileRegex()
	matches := regex.FindAllString(text, -1)

	fmt.Println("Found Domain:", matches)
}

Customizing Domain Extractor

You can customize how domains are extracted by specifying URL schemes, hosts, or providing custom regular expression patterns.

Extract domains with TLD Pattern:

extractor := hqgourl.NewDomainExtractor(
	hqgourl.DomainExtractorWithTLDPattern(`(?:com|net|org)`),
)

This configuration will extract only domains with com, net, or org TLDs.

Extract domains with Root Domain Pattern:

extractor := hqgourl.NewDomainExtractor(
	hqgourl.DomainExtractorWithRootDomainPattern(`(?:example|rootdomain)`), // Custom root domain pattern
)

This configuration will extract domains that have example or rootdomain root domain.

URLs

package main

import (
	"fmt"
	hqgourl "github.com/hueristiq/hq-go-url"
	"regexp"
)

func main() {
	extractor := hqgourl.NewExtractor()
	text := "Check out this website: https://example.com and send an email to info@example.com."

	regex := extractor.CompileRegex()
	matches := regex.FindAllString(text, -1)

	fmt.Println("Found URLs:", matches)
}

Customizing URL Extractor

You can customize how URLs are extracted by specifying URL schemes, hosts, or providing custom regular expression patterns.

Extract URLs with Schemes Pattern:
```
extractor := hqgourl.NewExtractor(
	hqgourl.ExtractorWithSchemePattern(`(?:https?|ftp)://`),
)
```
This configuration will extract URLs with http, https, or ftp schemes.
Extract URLs with Host Pattern:
```
extractor := hqgourl.NewExtractor(
	hqgourl.ExtractorWithHostPattern(`(?:www\.)?example\.com`),
)
```
This configuration will extract URLs that have hosts matching www.example.com or example.com.

Parsing

Domains

package main

import (
	"fmt"

	hqgourl "github.com/hueristiq/hq-go-url"
)

func main() {
	parser := hqgourl.NewDomainParser()

	parsed := parser.Parse("subdomain.example.com")

	fmt.Printf("Subdomain: %s, SLD: %s, TLD: %s\n", parsed.Subdomain, parsed.SLD, parsed.TLD)
}

URLs

package main

import (
	"fmt"

	hqgourl "github.com/hueristiq/hq-go-url"
)

func main() {
	parser := hqgourl.NewParser()

	parsed, err := parser.Parse("https://subdomain.example.com:8080/path/file.txt")
	if err != nil {
		fmt.Println("Error parsing URL:", err)

		return
	}

	fmt.Printf("Scheme: %s\n", parsed.Scheme)
	fmt.Printf("Host: %s\n", parsed.Host)
	fmt.Printf("Hostname: %s\n", parsed.Hostname())
	fmt.Printf("Subdomain: %s\n", parsed.Domain.Subdomain)
	fmt.Printf("SLD: %s\n", parsed.Domain.SLD)
	fmt.Printf("TLD: %s\n", parsed.Domain.TLD)
	fmt.Printf("Port: %s\n", parsed.Port())
	fmt.Printf("Path: %s\n", parsed.Path)
}

Set a default scheme:

parser := hqgourl.NewParser(hqgourl.ParserWithDefaultScheme("https"))

Contributing

We welcome contributions! Feel free to submit Pull Requests or report Issues. For more details, check out the contribution guidelines.

Licensing

This package is licensed under the MIT license. You are free to use, modify, and distribute it, as long as you follow the terms of the license. You can find the full license text in the repository - Full MIT license text.

Credits

Contributors

A huge thanks to all the contributors who have helped make hq-go-url what it is today!

Alternatives

If you're interested in more packages like this, check out:

urlx ◇ xurls

Documentation ¶

Index ¶

Variables
type Domain
- func (d *Domain) String() (domain string)
type DomainExtractor
- func NewDomainExtractor(opts ...DomainExtractorOptionFunc) (extractor *DomainExtractor)
- func (e *DomainExtractor) CompileRegex() (regex *regexp.Regexp)
type DomainExtractorInterface
type DomainExtractorOptionFunc
- func DomainExtractorWithRootDomainPattern(pattern string) DomainExtractorOptionFunc
- func DomainExtractorWithTLDPattern(pattern string) DomainExtractorOptionFunc
type DomainInterface
type DomainParser
- func NewDomainParser(opts ...DomainParserOptionFunc) (parser *DomainParser)
- func (p *DomainParser) Parse(domain string) (parsed *Domain)
type DomainParserInterface
type DomainParserOptionFunc
- func DomainParserWithTLDs(TLDs ...string) DomainParserOptionFunc
type Extractor
- func NewExtractor(opts ...ExtractorOptionFunc) (extractor *Extractor)
- func (e *Extractor) CompileRegex() (regex *regexp.Regexp)
type ExtractorInterface
type ExtractorOptionFunc
- func ExtractorWithHost() ExtractorOptionFunc
- func ExtractorWithHostPattern(pattern string) ExtractorOptionFunc
- func ExtractorWithScheme() ExtractorOptionFunc
- func ExtractorWithSchemePattern(pattern string) ExtractorOptionFunc
type Parser
- func NewParser(opts ...ParserOptionFunc) (parser *Parser)
- func (p *Parser) Parse(unparsed string) (parsed *URL, err error)
type ParserInterface
type ParserOptionFunc
- func ParserWithDefaultScheme(scheme string) ParserOptionFunc
type URL

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	// ExtractorSchemePattern defines a general pattern for matching URL schemes.
	// It matches any URL scheme that starts with alphabetical characters (a-z, A-Z), followed by
	// any combination of alphabets, dots (.), hyphens (-), or plus signs (+), and ends with "://".
	// Additionally, it matches schemes from a predefined list that do not require an authority (host),
	// ending with just a colon (":"). These are known as "no-authority" schemes (e.g., "mailto:").
	//
	// This pattern covers a broad range of schemes, making it versatile for extracting different types
	// of URLs, whether they require an authority component or not.
	ExtractorSchemePattern = `(?:[a-zA-Z][a-zA-Z.\-+]*://|` + anyOf(schemes.NoAuthority...) + `:)`

	// ExtractorKnownOfficialSchemePattern defines a pattern for matching officially recognized
	// URL schemes. These include well-known schemes such as "http", "https", "ftp", etc., as registered
	// with IANA. The pattern ensures that the scheme is followed by "://".
	//
	// This pattern ensures that only officially recognized schemes are matched.
	ExtractorKnownOfficialSchemePattern = `(?:` + anyOf(schemes.Official...) + `://)`

	// ExtractorKnownUnofficialSchemePattern defines a pattern for matching unofficial or less commonly
	// used URL schemes. These schemes may not be registered with IANA but are still valid in specific contexts,
	// such as application-specific schemes (e.g., "slack://", "zoommtg://").
	// The pattern ensures that the scheme is followed by "://".
	//
	// This pattern is useful for applications that work with unofficial or niche schemes.
	ExtractorKnownUnofficialSchemePattern = `(?:` + anyOf(schemes.Unofficial...) + `://)`

	// ExtractorKnownNoAuthoritySchemePattern defines a pattern for matching URL schemes that
	// do not require an authority component (host). These schemes are followed by a colon (":") rather than "://".
	// Examples include "mailto:", "tel:", and "sms:".
	//
	// This pattern is used for schemes where a host is not applicable, making it suitable for schemes
	// that involve direct communication (e.g., email or telephone).
	ExtractorKnownNoAuthoritySchemePattern = `(?:` + anyOf(schemes.NoAuthority...) + `:)`

	// ExtractorKnownSchemePattern combines the patterns for officially recognized, unofficial,
	// and no-authority-required schemes into a single comprehensive pattern.
	// It is case-insensitive (denoted by "(?i)") and matches the broadest possible range of URLs.
	//
	// This pattern is suitable for extracting any known scheme, regardless of its official status
	// or whether it requires an authority component.
	ExtractorKnownSchemePattern = `(?:(?i)(?:` + anyOf(schemes.Official...) + `|` + anyOf(schemes.Unofficial...) + `)://|` + anyOf(schemes.NoAuthority...) + `:)`

	// ExtractorIPv4Pattern defines a pattern for matching valid IPv4 addresses.
	// It matches four groups of 1 to 3 digits (0-255) separated by periods (e.g., "192.168.0.1").
	//
	// This pattern is essential for extracting or validating IPv4 addresses in URLs or hostnames.
	ExtractorIPv4Pattern = `` /* 206-byte string literal not displayed */

	// ExtractorNonEmptyIPv6Pattern defines a detailed pattern for matching valid, non-empty IPv6 addresses.
	// It accounts for various valid formats of IPv6 addresses, including those with elisions ("::") and IPv4
	// address representations.
	//
	// This pattern supports matching fully expanded IPv6 addresses, elided sections, and IPv4-mapped IPv6 addresses.
	ExtractorNonEmptyIPv6Pattern = `(?:` +

		`(?:[0-9a-fA-F]{1,4}:){7}(?:[0-9a-fA-F]{1,4}|:)|` +

		`(?:[0-9a-fA-F]{1,4}:){6}(?:` + ExtractorIPv4Pattern + `|:[0-9a-fA-F]{1,4}|:)|` +

		`(?:[0-9a-fA-F]{1,4}:){5}(?::` + ExtractorIPv4Pattern + `|(?::[0-9a-fA-F]{1,4}){1,2}|:)|` +

		`(?:[0-9a-fA-F]{1,4}:){4}(?:(?::[0-9a-fA-F]{1,4}){0,1}:` + ExtractorIPv4Pattern + `|(?::[0-9a-fA-F]{1,4}){1,3}|:)|` +

		`(?:[0-9a-fA-F]{1,4}:){3}(?:(?::[0-9a-fA-F]{1,4}){0,2}:` + ExtractorIPv4Pattern + `|(?::[0-9a-fA-F]{1,4}){1,4}|:)|` +

		`(?:[0-9a-fA-F]{1,4}:){2}(?:(?::[0-9a-fA-F]{1,4}){0,3}:` + ExtractorIPv4Pattern + `|(?::[0-9a-fA-F]{1,4}){1,5}|:)|` +

		`(?:[0-9a-fA-F]{1,4}:){1}(?:(?::[0-9a-fA-F]{1,4}){0,4}:` + ExtractorIPv4Pattern + `|(?::[0-9a-fA-F]{1,4}){1,6}|:)|` +

		`:(?:(?::[0-9a-fA-F]{1,4}){0,5}:` + ExtractorIPv4Pattern + `|(?::[0-9a-fA-F]{1,4}){1,7})` +
		`)`

	// ExtractorIPv6Pattern is a comprehensive pattern that matches both fully expanded and compressed IPv6 addresses.
	// It also handles "::" elision and optional IPv4-mapped sections.
	ExtractorIPv6Pattern = `(?:` + ExtractorNonEmptyIPv6Pattern + `|::)`

	// ExtractorPortPattern defines a pattern for matching port numbers in URLs.
	// It matches valid port numbers (1 to 65535) that are typically found in network addresses.
	// The port number is preceded by a colon (":").
	ExtractorPortPattern = `(?::[0-9]{1,4}|[1-5][0-9]{4}|6[0-5][0-9]{3}\b)`

	// ExtractorPortOptionalPattern is similar to ExtractorPortPattern but makes the port number optional.
	// This is useful for matching URLs where the port may or may not be specified.
	ExtractorPortOptionalPattern = ExtractorPortPattern + `?`
)

Functions ¶

This section is empty.

Types ¶

type Domain ¶

type Domain struct {
	Subdomain string
	SLD       string
	TLD       string
}

Domain represents a parsed domain name, broken down into three main components:

Subdomain: The subdomain part of the domain (e.g., "www" in "www.example.com").
SLD: The root domain, also known as the second-level domain (SLD), which is the core part of the domain (e.g., "example" in "www.example.com").
TLD: The top-level domain (TLD), which is the domain suffix or extension (e.g., "com" in "www.example.com").

This struct is useful in scenarios where you need to manipulate and analyze domain names. It can be applied in tasks such as:

Domain validation (e.g., ensuring that domains conform to expected formats).
URL parsing (e.g., breaking down a URL into its domain components).
Domain classification (e.g., identifying and grouping URLs by subdomain, root domain, or TLD).

By splitting a domain into its components, you can easily identify domain hierarchies, manipulate specific parts of a domain, or analyze domain names for SEO, security, or categorization purposes.

Example:

domain := Domain{
    Subdomain: "www",  // Subdomain part ("www")
    SLD:       "example",  // Second-level domain part ("example")
    TLD:       "com",  // Top-level domain part ("com")
}

// Output: "www.example.com"
fmt.Println(domain.String())

func (*Domain) String ¶

func (d *Domain) String() (domain string)

String reassembles the components of the domain (Subdomain, SLD, and TLD) back into a complete domain name string. Non-empty components are joined with a dot ("."). If any component is missing, it is omitted from the final output. This method is useful for reconstructing domain names after parsing.

Example:

If Subdomain = "www", SLD = "example", and TLD = "com", the output will be "www.example.com".
If Subdomain is empty, the output will be "example.com".
If both Subdomain and TLD are empty, the output will be just the SLD "example".

Returns:

domain (string): The reconstructed domain name string.

type DomainExtractor ¶

type DomainExtractor struct {
	RootDomainPattern     string // Custom regex pattern for matching the root domain (e.g., "example").
	TopLevelDomainPattern string // Custom regex pattern for matching the TLD (e.g., "com").
}

DomainExtractor is responsible for extracting domain names, including both root domains and top-level domains (TLDs), using regular expressions. It provides flexibility in the domain extraction process by allowing custom patterns for both root domains and TLDs.

func NewDomainExtractor ¶

func NewDomainExtractor(opts ...DomainExtractorOptionFunc) (extractor *DomainExtractor)

NewDomainExtractor creates and initializes a DomainExtractor with optional configurations. By default, it uses pre-defined patterns for extracting root domains and TLDs, but custom patterns can be applied using the provided options.

Returns:

extractor: A pointer to the initialized DomainExtractor.

func (*DomainExtractor) CompileRegex ¶

func (e *DomainExtractor) CompileRegex() (regex *regexp.Regexp)

CompileRegex compiles a regular expression based on the configured DomainExtractor. It builds a regex that can match domains, combining the root domain pattern with the top-level domain (TLD) pattern. The method separates ASCII and Unicode TLDs and includes a punycode pattern to handle internationalized domain names (IDNs). It also ensures that the regex captures the longest possible domain match.

Returns:

regex: The compiled regular expression for matching domain names.

type DomainExtractorInterface ¶

type DomainExtractorInterface interface {
	CompileRegex() (regex *regexp.Regexp)
}

DomainExtractorInterface defines the interface for domain extraction functionality. It ensures that any domain extractor can compile regular expressions to match domain names.

type DomainExtractorOptionFunc ¶

type DomainExtractorOptionFunc func(*DomainExtractor)

DomainExtractorOptionFunc defines a function type for configuring a DomainExtractor. It allows setting options like custom patterns for root domains and TLDs.

func DomainExtractorWithRootDomainPattern ¶

func DomainExtractorWithRootDomainPattern(pattern string) DomainExtractorOptionFunc

DomainExtractorWithRootDomainPattern returns an option function to configure the DomainExtractor with a custom regex pattern for matching root domains (e.g., "example" in "example.com").

Parameters:

pattern: The custom root domain regex pattern.

Returns:

A function that applies the custom root domain pattern to the DomainExtractor.

func DomainExtractorWithTLDPattern ¶

func DomainExtractorWithTLDPattern(pattern string) DomainExtractorOptionFunc

DomainExtractorWithTLDPattern returns an option function to configure the DomainExtractor with a custom regex pattern for matching top-level domains (TLDs) (e.g., "com" in "example.com").

Parameters:

pattern: The custom TLD regex pattern.

Returns:

A function that applies the custom TLD pattern to the DomainExtractor.

type DomainInterface ¶

type DomainInterface interface {
	String() (domain string)
}

DomainInterface defines an interface for domain representations.

type DomainParser ¶

type DomainParser struct {
	// contains filtered or unexported fields
}

DomainParser is responsible for parsing domain names into their constituent parts: subdomain, root domain (SLD), and top-level domain (TLD). It utilizes a suffix array to efficiently identify TLDs from a comprehensive list of known TLDs (both standard and pseudo-TLDs). This allows the parser to split the domain into subdomain, root domain, and TLD components quickly and accurately.

The suffix array helps in handling a large number of known TLDs and enables fast lookups, even for complex domain structures where subdomains might be mistaken for TLDs.

Fields:

sa (*suffixarray.Index):
The suffix array index used for efficiently searching through known TLDs.
This allows for rapid identification of the TLD in the domain string.

Example Usage:

parser := NewDomainParser()
domain := "www.example.com"
parsedDomain := parser.Parse(domain)
fmt.Println(parsedDomain.Subdomain)  // Output: "www"
fmt.Println(parsedDomain.SLD)        // Output: "example"
fmt.Println(parsedDomain.TLD)        // Output: "com"

func NewDomainParser ¶

func NewDomainParser(opts ...DomainParserOptionFunc) (parser *DomainParser)

NewDomainParser creates a new DomainParser instance and initializes it with a comprehensive list of TLDs, including both standard TLDs and pseudo-TLDs. Additional options can be passed to customize the parser, such as using a custom set of TLDs.

Parameters:

opts (variadic DomainParserOptionFunc): Optional configuration options.

Returns:

parser (*DomainParser): A pointer to the initialized DomainParser.

func (*DomainParser) Parse ¶

func (p *DomainParser) Parse(domain string) (parsed *Domain)

Parse takes a full domain string (e.g., "www.example.com") and splits it into three main components: subdomain, root domain (SLD), and TLD. The method uses the suffix array to identify the TLD and then extracts the subdomain and root domain from the rest of the domain string.

Parameters:

domain (string): The full domain string to be parsed.

Returns:

parsed (*Domain): A pointer to a Domain struct containing the subdomain, root domain (SLD), and TLD.

type DomainParserInterface ¶

type DomainParserInterface interface {
	Parse(domain string) (parsed *Domain)
	// contains filtered or unexported methods
}

DomainParserInterface defines the interface for domain parsing functionality.

type DomainParserOptionFunc ¶

type DomainParserOptionFunc func(*DomainParser)

DomainParserOptionFunc defines a function type for configuring a DomainParser instance. This allows customization options like specifying custom TLDs.

Example:

parser := NewDomainParser(DomainParserWithTLDs("custom", "tld"))

func DomainParserWithTLDs ¶

func DomainParserWithTLDs(TLDs ...string) DomainParserOptionFunc

DomainParserWithTLDs allows the DomainParser to be initialized with a custom set of TLDs. This option is useful for handling non-standard or niche TLDs that may not be included in the default set.

Parameters:

TLDs ([]string): A slice of custom TLDs to be used by the DomainParser.

Returns:

A DomainParserOptionFunc that applies the custom TLDs to the parser.

type Extractor ¶

type Extractor struct {
	// contains filtered or unexported fields
}

Extractor is a struct that configures the URL extraction process. It provides options for controlling whether URL schemes and hosts are mandatory, and allows custom regular expression patterns to be specified for these components. This allows fine-grained control over the types of URLs that are extracted from text.

func NewExtractor ¶

func NewExtractor(opts ...ExtractorOptionFunc) (extractor *Extractor)

NewExtractor creates a new Extractor instance with optional configuration. The options can be used to customize how URLs are extracted, such as whether to include URL schemes or hosts.

func (*Extractor) CompileRegex ¶

func (e *Extractor) CompileRegex() (regex *regexp.Regexp)

CompileRegex constructs and compiles a regular expression based on the Extractor configuration. It builds a regex pattern that can capture various forms of URLs, including those with or without schemes and hosts. The method also supports custom patterns provided by the user, ensuring that the longest possible match for a URL is found, improving accuracy in URL extraction.

type ExtractorInterface ¶

type ExtractorInterface interface {
	CompileRegex() (regex *regexp.Regexp)
}

ExtractorInterface defines the interface that Extractor should implement. It ensures that Extractor has the ability to compile regex patterns for URL extraction.

type ExtractorOptionFunc ¶

type ExtractorOptionFunc func(*Extractor)

ExtractorOptionFunc defines a function type for configuring Extractor instances. It allows users to pass options that modify the behavior of the Extractor, such as whether to include schemes or hosts in URL extraction.

func ExtractorWithHost ¶

func ExtractorWithHost() ExtractorOptionFunc

ExtractorWithHost returns an option function that configures the Extractor to require URL hosts in the extraction process.

func ExtractorWithHostPattern ¶

func ExtractorWithHostPattern(pattern string) ExtractorOptionFunc

ExtractorWithHostPattern returns an option function that allows specifying a custom regex pattern for matching URL hosts.

func ExtractorWithScheme ¶

func ExtractorWithScheme() ExtractorOptionFunc

ExtractorWithScheme returns an option function that configures the Extractor to require URL schemes in the extraction process.

func ExtractorWithSchemePattern ¶

func ExtractorWithSchemePattern(pattern string) ExtractorOptionFunc

ExtractorWithSchemePattern returns an option function that allows specifying a custom regex pattern for matching URL schemes.

type Parser ¶

type Parser struct {
	// contains filtered or unexported fields
}

Parser is responsible for parsing URLs while also handling domain-related parsing through the use of a DomainParser. It extends basic URL parsing functionality by providing support for handling custom schemes and extracting domain components such as subdomains, root domains, and TLDs.

Fields:

dp (*DomainParser):
A reference to a `DomainParser` used for extracting subdomain, root domain, and TLD information from the host part of the URL.
scheme (string):
The default scheme to use when parsing URLs without a specified scheme. For example, if a URL is missing a scheme (e.g., "www.example.com"), the `scheme` field will prepend a default scheme like "https", resulting in "https://www.example.com".

Methods:

Parse(unparsed string) (parsed *URL, err error):
Takes a raw URL string and parses it into a custom `URL` struct that includes both the standard URL components (via the embedded `net/url.URL`) and domain-specific details.
If the URL does not include a scheme, the default scheme is added (if specified).
Additionally, the method uses the DomainParser to break down the domain into subdomain, root domain, and TLD components.

Example Usage:

parser := NewParser(ParserWithDefaultScheme("https"))
parsedURL, err := parser.Parse("example.com/path")
if err != nil {
    log.Fatal(err)
}
fmt.Println(parsedURL.Scheme)     // Output: https
fmt.Println(parsedURL.Hostname()) // Output: example.com
fmt.Println(parsedURL.Domain.Root) // Output: example

func NewParser ¶

func NewParser(opts ...ParserOptionFunc) (parser *Parser)

NewParser creates and initializes a new Parser with the given options. The Parser is also initialized with a DomainParser for extracting domain-specific details such as subdomain, root domain, and TLD. Additional configuration options can be applied using the variadic `opts` parameter.

Parameters:

opts: A variadic list of `ParserOptionFunc` functions that can configure the Parser.

Returns:

parser (*Parser): A pointer to the initialized Parser instance.

func (*Parser) Parse ¶

func (p *Parser) Parse(unparsed string) (parsed *URL, err error)

Parse takes a raw URL string and parses it into a custom URL struct that includes:

Standard URL components from `net/url` (scheme, host, path, etc.)
Domain-specific details such as subdomain, root domain, and TLD.

If the URL does not specify a scheme, the default scheme (if any) is added. The method also validates and parses the host and port (if specified).

Parameters:

unparsed (string): The raw URL string to parse.

Returns:

parsed (*URL): A pointer to the parsed URL struct containing both standard URL components and domain-specific details.
err (error): An error if the URL cannot be parsed.

type ParserInterface ¶

type ParserInterface interface {
	Parse(unparsed string) (parsed *URL, err error)
}

ParserInterface defines the interface that all Parser implementations must adhere to.

type ParserOptionFunc ¶

type ParserOptionFunc func(*Parser)

ParserOptionFunc defines a function type for configuring a Parser instance. It is used to apply various options such as setting the default scheme.

Example:

parser := NewParser(ParserWithDefaultScheme("https"))

func ParserWithDefaultScheme ¶

func ParserWithDefaultScheme(scheme string) ParserOptionFunc

ParserWithDefaultScheme returns a `ParserOptionFunc` that sets the default scheme for the Parser. This function allows you to specify a default scheme (e.g., "http" or "https") that will be added to URLs that don't provide one.

Parameters:

scheme (string): The default scheme to set (e.g., "http" or "https").

Returns:

A `ParserOptionFunc` that applies the default scheme to the Parser.

type URL ¶

type URL struct {
	*url.URL

	Domain *Domain
}

URL extends the standard net/url URL struct by embedding it and adding additional fields for handling domain-related information. This extension provides a more detailed representation of the URL by including a separate `Domain` struct that breaks down the domain into Subdomain, second-level domain (SLD), and top-level domain (TLD).

Fields:

URL (*url.URL):
Embeds the standard `net/url.URL` struct, which provides all the base URL parsing and functionalities, such as handling the scheme, host, path, query parameters, and fragment.
Methods and functions from the embedded `net/url.URL` can be used transparently.
Domain (*Domain):
A pointer to the `Domain` struct that contains parsed domain information, including:
Subdomain (string): The subdomain of the URL (e.g., "www" in "www.example.com").
Second-level domain (SLD) (string): The main domain (e.g., "example").
Top-level domain (TLD) (string): The domain suffix (e.g., "com" in "www.example.com").
This allows for better handling of domain components, which is useful in cases like:
URL classification and domain analysis.
Security or SEO applications where separating domain components is important.

Example Usage:

// Parse a URL using the standard url.Parse method.
parsedURL, _ := url.Parse("https://www.example.com")

// Create an extended URL object and manually add domain information.
extendedURL := &URL{
    URL: parsedURL, // Embeds the parsed URL from the standard library.

    // Domain can be parsed separately or manually assigned.
    Domain: &Domain{
        Subdomain:      "www",     // Subdomain part (e.g., "www").
        SLD:     "example", // Root domain part (e.g., "example").
        TLD: "com",     // Top-level domain part (e.g., "com").
    },
}

// Access standard URL components.
fmt.Println(extendedURL.Scheme)   // Output: https
fmt.Println(extendedURL.Host)     // Output: www.example.com
fmt.Println(extendedURL.Path)     // Output: /

// Access domain-specific information.
fmt.Println(extendedURL.Domain.Subdomain)      // Output: www
fmt.Println(extendedURL.Domain.SLD)     // Output: example
fmt.Println(extendedURL.Domain.TLD) // Output: com

Purpose:

This `URL` struct provides a more detailed breakdown of a URL's domain components,
making it particularly useful for tasks involving domain analysis, URL classification,
or scenarios where understanding subdomains, root domains, and TLDs is important.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
gen
TLDs
schemes
unicodes
schemes Package schemes provides a collection of constants and lists representing URL schemes.	Package schemes provides a collection of constants and lists representing URL schemes.
tlds Package tlds provides a collection of constants and lists representing official top-level domains (TLDs) and pseudo or special-use TLDs.	Package tlds provides a collection of constants and lists representing official top-level domains (TLDs) and pseudo or special-use TLDs.
unicodes Package unicodes provides constants for defining sets of allowed Unicode characters.	Package unicodes provides constants for defining sets of allowed Unicode characters.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL