katsuragi

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 26, 2024 License: GPL-3.0 Imports: 9 Imported by: 0

README

katsuragi

Go Build codecov

A Go toolkit for web content processing, analysis, and SEO optimization, offering utilities to efficiently extract favicons, links, descriptions and titles.

[!NOTE] Each method is thoroughly tested and optimized for performance, but the package is still in development and may contain unseen bugs. Please don't hesitate to report any issues you encounter!

Table of Contents

Features

  • LRU Caching
  • Timeout
  • User-Agent

Installation

go get github.com/devnyxie/katsuragi

Usage

Title

The GetTitle() function currently supports the following title meta tags:

  • <title>Title</title>
  • <meta name="twitter:title" content="Title">
  • <meta property="og:title" content="Title">
import (
	. "github.com/devnyxie/katsuragi"
)

func main() {
  // Create a new fetcher with a timeout of 3 seconds and a cache capacity of 10
  fetcher := NewFetcher(
    &FetcherProps{
      Timeout:       3000, // 3 seconds
      CacheCap: 10, // 10 Network Requests will be cached
    },
  )

  defer fetcher.ClearCache()

  // Get website's title
  title, err := fetcher.GeTitle("https://www.example.com")
}

Description

The GetDescription() function currently supports the following description meta tags:

  • <meta name="description" content="Description">
  • <meta name="twitter:description" content="Description">
  • <meta property="og:description" content="Description">
...
  // Get website's description
  description, err := fetcher.GetDescription("https://www.example.com")
...

Favicons

The GetFavicons() function currently supports the following favicon meta tags:

  • <link rel="icon" href="favicon.ico">
  • <link rel="apple-touch-icon" href="favicon.png">
  • <meta property="og:image" content="favicon.png">

    Open Graph image (og:image) will be used only if both og:image:width and og:image:height are present and equal, forming a square image.

...
  // Get website's favicons
  favicons, err := fetcher.GetFavicons("https://www.example.com")
  // [https://www.example.com/favicon.ico, https://www.example.com/favicon.png]
...

The GetLinks() function searches for all <a> tags in the HTML document and returns a slice of links.

Options:

  • Url (required): The URL of the website to fetch.
  • Category (optional): The category of links to fetch. Possible values are internal, external, and all. Default is all.
  // Get website's links
  links, err := fetcher.GetLinks(GetLinksProps{
    Url: "https://www.example.com",
    Category: "external",
  })
  // [https://www.youtube.com/example, https://www.facebook.com/example]

Local Development

Testing

go test -v

Code Coverage

# Generate coverage.out report, generate HTML report from coverage.out, and open the HTML report in the browser:
go test -coverprofile=coverage.out && go tool cover -html=coverage.out -o coverage.html && open coverage.html

License

This project is licensed under the GNU General Public License (GPL). You can find the full text of the license here.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type DomainParts added in v0.3.0

type DomainParts struct {
	Subdomain string
	Root      string
	TLD       string
}

type Fetcher

type Fetcher struct {
	// contains filtered or unexported fields
}

func NewFetcher

func NewFetcher(props *FetcherProps) *Fetcher

func (*Fetcher) ClearCache

func (f *Fetcher) ClearCache()

func (*Fetcher) GetDescription

func (f *Fetcher) GetDescription(url string) (string, error)

func (*Fetcher) GetFavicons

func (f *Fetcher) GetFavicons(url string) ([]string, error)

func (*Fetcher) GetFromCache

func (f *Fetcher) GetFromCache(url string) (*html.Node, bool, error)
func (f *Fetcher) GetLinks(props GetLinksProps) ([]string, error)

GetLinks fetches links from the given URL based on the category ("all", "internal", "external")

func (*Fetcher) GetTitle

func (f *Fetcher) GetTitle(url string) (string, error)

type FetcherProps

type FetcherProps struct {
	UserAgent string
	Timeout   time.Duration //ms
	CacheCap  int
}

type GetLinksProps added in v0.3.0

type GetLinksProps struct {
	Url      string
	Category string
}

type UserAgentTransport added in v0.2.0

type UserAgentTransport struct {
	UserAgent string
	Transport http.RoundTripper
}

HTTP Client

func (*UserAgentTransport) RoundTrip added in v0.2.0

func (t *UserAgentTransport) RoundTrip(req *http.Request) (*http.Response, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL