urlinfo

package module
v0.0.0-...-d5b1986 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 27, 2019 License: MIT Imports: 4 Imported by: 0

README

urlinfo

A simple service that keeps in-memory list of URLs and responds to queries asking whether a given URL is present or not. Note that the service does not care about the protocol (http/https) and http will always be assumed internally. Data files with URL lists need to start with http://.

urlinfo is naive and assumes there is enough memory to load the given data file. While loading, URLs are normalized and hashed to 16 bytes, to save on memory use.

The service will first load the file to memory, and then start service requests.

Installing

Install the service with go get github.com/mayo/urlinfo/.... This will download the package and install urlinfo binary in your Go bin directory.

Start

To read URLs from urls.txt file, execute: urlinfo -datafile urls.txt

Checking URLs

To check whether a URL is contained, query the service like so: http://localhost:8080/urlinfo/1/domain.tlc/path. This queries the service to check if http://domain.tlc/path exists in the URL list:

  • If the url exists in the list, the service will respond with {"match": true}.
  • If the url does not exist in the list, the service will respond with {"match": false}.

Running in Docker

The included script build-docker-image.sh will cross-compile the Go binary for Linux and build a minimalistic Docker image named urlinfo. It exposes port 8080, and expects the data file to be in /data/dataset.txt (/data is marked as volume).

An instance then can be started with: docker run -d -p 8080:8080 -v /path/to/datafile.txt:/data/dataset.txt urlinfo.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ByteMapURLDB

type ByteMapURLDB struct {
	// contains filtered or unexported fields
}

ByteMapURLDB is a map based URL database, storing a hashed URL

func NewByteMapURLDB

func NewByteMapURLDB() *ByteMapURLDB

NewByteMapURLDB initiqlized a new ByteMapURLDB with an empty map

func (*ByteMapURLDB) Add

func (hmdb *ByteMapURLDB) Add(url string)

Add a new entry to the DB

func (*ByteMapURLDB) Hash

func (hmdb *ByteMapURLDB) Hash(data string) (out ByteSum)

Hash the given string (URL)

func (*ByteMapURLDB) Load

func (hmdb *ByteMapURLDB) Load(filename string) (err error)

Load data into the internal map. The file is expected to have a normalized url per line, starting with http://.

func (*ByteMapURLDB) Lookup

func (hmdb *ByteMapURLDB) Lookup(url string) bool

Lookup given URL in data store and return true if the URL is present

type ByteSum

type ByteSum [16]byte

ByteSum is a 16 byte array

type ByteSumBoolMap

type ByteSumBoolMap map[ByteSum]bool

ByteSumBoolMap is maps ByteSum to boolean

type StringMapURLDB

type StringMapURLDB struct {
	// contains filtered or unexported fields
}

StringMapURLDB is a map based URL database, storing the URL (key) as string.

func NewStringMapURLDB

func NewStringMapURLDB() *StringMapURLDB

NewStringMapURLDB creates a new instance of MapURLDB with an empty map

func (*StringMapURLDB) Add

func (mdb *StringMapURLDB) Add(url string)

Add a new entry to the DB

func (*StringMapURLDB) Load

func (mdb *StringMapURLDB) Load(filename string) error

Load data into the internal map. The file is expected to have a normalized url per line, starting with http://

func (*StringMapURLDB) Lookup

func (mdb *StringMapURLDB) Lookup(url string) bool

Lookup given URL in data store and return true if the URL is present

type URLDB

type URLDB interface {
	Lookup(url string) bool
	Load(filename string) error
	Add(url string)
}

URLDB is a generic interface for lookup and loading a URL database

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL