dor

package module
v2.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 7, 2019 License: MIT Imports: 17 Imported by: 0

README

Build Status

DOR - Domain Ranker

Fast HTTP service which shows a specified domain rank from following providers:

Can be used as a base for a domain categorization / network filters / suspicious domain detection.

Data is updated once a day automatically.

Supported types of storages:

  • Clickhouse (recommended)
  • MongoDB
  • In-Memory

You can easily add the storage you like by implementing Storage interface.

Installation

Check out the releases page.

HTTP service usage

Use Clickhouse storage located at clickhouse and bind to port 8080:

go run service/dor-http/dor.go \
    -storage=clickhouse \
    -storage-url=tcp://clickhouse:9000 \
    -listen-addr=:8080

Fill database with the data

go run cmd/dor-insert/dor-insert \
    -storage=clickhouse \
    -storage-url=tcp://clickhouse:9000

Docker usage

Project has docker-compose that uses Clickhouse as a storage. Make changes here accordingly if any (folder for data persistence, ports etc).

docker-compose up -d

Client usage

$: curl 127.0.0.1:8080/rank/github.com

{
  "data": "github.com",
  "ranks": [
    {
      "domain": "github.com",
      "rank": 2698,
      "date": "2019-09-07T00:00:00Z",
      "source": "umbrella",
      "raw": ""
    },
    {
      "domain": "github.com",
      "rank": 29,
      "date": "2019-09-07T00:00:00Z",
      "source": "majestic",
      "raw": "29,24,github.com,com,176946,489686,github.com,com,29,24,176096,487221"
    },
    {
      "domain": "github.com",
      "rank": 26,
      "date": "2019-09-07T00:00:00Z",
      "source": "pagerank",
      "raw": ""
    },
    {
      "domain": "github.com",
      "rank": 32,
      "date": "2019-09-07T00:00:00Z",
      "source": "alexa",
      "raw": ""
    },
    {
      "domain": "github.com",
      "rank": 467,
      "date": "2019-09-07T00:00:00Z",
      "source": "yandex-radar",
      "raw": "The world’s leading software development platform · GitHub,github.com,,Сервисы,,,1520000,2340000,,,"
    },
    {
      "domain": "github.com",
      "rank": 43,
      "date": "2019-09-07T00:00:00Z",
      "source": "tranco",
      "raw": ""
    },
    {
      "domain": "github.com",
      "rank": 168,
      "date": "2019-09-07T00:00:00Z",
      "source": "quantcast",
      "raw": ""
    }
  ],
  "timestamp": "2019-09-07T14:32:32.9725943Z"
}

Documentation

Overview

Package dor is a domain rank data collection library and fast HTTP service which shows a specified domain's rank from the following providers: * Alexa * Majestic * Umbrella OpenDNS * Open PageRank * Tranco * Quantcast

Can be used as a base for a domain categorization, network filters or suspicious domain detection. Data is updated automatically by dor-insert once a day by default.

See service/dor-http/dor.go for an example of the Dor HTTP service and cmd/dor-insert/dor-insert.go for the data insertion script.

Client request example:

curl 127.0.0.1:8080/rank/github.com

Server response:

{
  "data": "github.com",
  "ranks": [
	{
	  "domain": "github.com",
	  "rank": 33,
	  "date": "2018-01-11T18:01:27.251103268Z",
	  "source": "majestic",
	  "raw": "29,23,github.com,com,179825,518189,github.com,com,29,23,179994,518726"
	},
	{
	  "domain": "github.com",
	  "rank": 72,
	  "date": "2018-01-11T18:04:26.267833256Z",
	  "source": "alexa",
	  "raw": ""
	},
	{
	  "domain": "github.com",
	  "rank": 2367,
	  "last_update": "2018-01-11T18:06:50.866600102Z",
	  "source": "umbrella",
	  "raw": ""
	},
	{
	  "domain": "github.com",
	  "rank": 115,
	  "last_update": "2018-03-27T17:01:13.535Z",
	  "source": "pagerank",
	  "raw": ""
	},
	{
	  "domain": "github.com",
	  "rank": 68,
	  "last_update": "2018-03-27T17:01:13.535Z",
	  "source": "tranco",
	  "raw": ""
	},
	{
	  "domain": "github.com",
	  "rank": 114,
	  "date": "2019-05-04T00:00:00Z",
	  "source": "quantcast",
	  "raw": ""
	}
  ],
  "timestamp": "2018-01-11T18:07:09.186271429Z"
}

Index

Constants

This section is empty.

Variables

View Source
var DefaultTTL = 30

DefaultTTL for records in days.

Functions

This section is empty.

Types

type AlexaIngester

type AlexaIngester struct {
	IngesterConf
}

AlexaIngester represents Ingester implementation for Alexa Top 1 Million websites

func NewAlexa

func NewAlexa() *AlexaIngester

NewAlexa bootstraps AlexaIngester

func (*AlexaIngester) Do

func (in *AlexaIngester) Do() (chan *Entry, error)

Do implements Ingester Do func with the data from Alexa Top 1M CSV file

type App

type App struct {
	Ingesters []Ingester
	Storage   Storage
	Keep      bool
}

App represents Dor configuration options

func New

func New(stn string, stl string, keep bool) (*App, error)

New bootstraps App struct.

stn - storage name
stl - storage location string
keep - keep new data or overwrite old one (always false for MemoryStorage)

func (*App) Fill

func (d *App) Fill() error

Fill fills available Ingester interfaces.

func (*App) FillByTimer

func (d *App) FillByTimer(duration time.Duration) error

FillByTimer combines filling and updating on a specific duration

func (*App) Find

func (d *App) Find(domain string, sources ...string) (*FindResponse, error)

Find represents find operation on the storage available

type ClickhouseStorage

type ClickhouseStorage struct {
	// contains filtered or unexported fields
}

ClickhouseStorage is a dor.Storage that uses Clickhouse database.

func NewClickhouseStorage

func NewClickhouseStorage(location, table string, batch int) (*ClickhouseStorage, error)

NewClickhouseStorage bootstraps ClickhouseStorage.

func (*ClickhouseStorage) Get

func (c *ClickhouseStorage) Get(d string, sources ...string) ([]*Entry, error)

Get ranks for specified domain and sources.

func (*ClickhouseStorage) GetMore

func (c *ClickhouseStorage) GetMore(d string, lps int, sources ...string) ([]*Entry, error)

GetMore returns lps entries for each source for a specified domain.

func (*ClickhouseStorage) Put

func (c *ClickhouseStorage) Put(entries <-chan *Entry, s string, t time.Time) error

Put implements Storage interface method Put

s - is the data source
t - is the data datetime

type Entry

type Entry struct {
	Domain  string    `json:"domain" db:"domain" bson:"domain"`
	Rank    uint32    `json:"rank" db:"rank" bson:"rank"`
	Date    time.Time `json:"date" bson:"date"`
	Source  string    `json:"source" bson:"source"`
	RawData string    `json:"raw" bson:"raw"`
}

Entry is a SimpleRank with extended fields

type FindResponse

type FindResponse struct {
	RequestData string    `json:"data"`
	Hits        []*Entry  `json:"ranks"`
	Timestamp   time.Time `json:"timestamp"`
}

FindResponse is a find request response.

type Ingester

type Ingester interface {
	Do() (chan *Entry, error) // returns a channel for consumers
	GetDesc() string          // simple getter for the source
}

Ingester fetches data and uploads it to the Storage

type IngesterConf

type IngesterConf struct {
	sync.Mutex
	Description string
	Timestamp   time.Time
}

IngesterConf represents a top popular domains provider configuration.

Implemented ingesters by now are:

  • Alexa Top 1 Million
  • Majestic Top 1 Million
  • Umbrella Top 1 Million
  • PageRank Top 10 Millions
  • Tranco Top 1 Million

func (*IngesterConf) GetDesc

func (in *IngesterConf) GetDesc() string

GetDesc is a simple getter for a collection's description

type LookupMap

type LookupMap map[string]uint32

LookupMap represents map with domain - rank pairs

type MajesticIngester

type MajesticIngester struct {
	IngesterConf
	// contains filtered or unexported fields
}

MajesticIngester is a List implementation which downloads data and translates it to LookupMap

More info: https://blog.majestic.com/development/alexa-top-1-million-sites-retired-heres-majestic-million/

func NewMajestic

func NewMajestic() *MajesticIngester

NewMajestic bootstraps MajesticIngester

func (*MajesticIngester) Do

func (in *MajesticIngester) Do() (chan *Entry, error)

Do implements Ingester interface with the data from Majestic CSV file

type MemoryStorage

type MemoryStorage struct {
	Maps map[string]*memoryCollection
}

MemoryStorage implements Storage interface as in-memory storage

func (*MemoryStorage) Get

func (ms *MemoryStorage) Get(d string, sources ...string) ([]*Entry, error)

Get implements Get method of the Storage interface

func (*MemoryStorage) GetMore

func (ms *MemoryStorage) GetMore(d string, lps int, sources ...string) ([]*Entry, error)

GetMore is not supported for the memory storage

func (*MemoryStorage) Put

func (ms *MemoryStorage) Put(c <-chan *Entry, s string, t time.Time) error

Put implements Put method of the Storage interface

type MongoStorage

type MongoStorage struct {
	// contains filtered or unexported fields
}

MongoStorage implements the Storage interface for MongoDB

func NewMongoStorage

func NewMongoStorage(u string, db string, col string, size int, w int, ret bool) (*MongoStorage, error)

NewMongoStorage bootstraps MongoStorage, creates indexes

u is the Mongo URL
db is the database name
col is the collection name
size is the bulk message size
w is number of workers
ret is the data retention option

func (*MongoStorage) Get

func (m *MongoStorage) Get(d string, sources ...string) ([]*Entry, error)

Get implements Storage interface method Get

func (*MongoStorage) GetMore

func (m *MongoStorage) GetMore(d string, lps int, sources ...string) ([]*Entry, error)

GetMore implements Storage GetMore function

func (*MongoStorage) Put

func (m *MongoStorage) Put(c <-chan *Entry, s string, t time.Time) error

Put implements Storage interface method Put

s - is the data source
t - is the data datetime

type PageRankIngester

type PageRankIngester struct {
	IngesterConf
}

PageRankIngester represents Ingester implementation for Domcop PageRank top 10M domains

func NewPageRank

func NewPageRank() *PageRankIngester

NewPageRank bootstraps PageRankIngester

func (*PageRankIngester) Do

func (in *PageRankIngester) Do() (chan *Entry, error)

Do implements Ingester Do func with the data from DomCop

type QuantcastIngester

type QuantcastIngester struct {
	IngesterConf
}

QuantcastIngester represents Ingester implementation for Quantcast Top 1 Million websites.

func NewQuantcast

func NewQuantcast() *QuantcastIngester

NewQuantcast bootstraps QuantcastIngester.

func (*QuantcastIngester) Do

func (in *QuantcastIngester) Do() (chan *Entry, error)

Do gets the data from Quantcast Top 1M txt file.

type Storage

type Storage interface {
	Put(<-chan *Entry, string, time.Time) error             // Put is usually a bulk inserter from the channel that works in a goroutine, second argument is a Source of the data and third is the last update time.
	Get(domain string, sources ...string) ([]*Entry, error) // Get is a simple getter for the latest rank of the domain in a particular domain rank provider or all of them if nothing selected.
}

Storage represents an interface to store and query ranks.

type TrancoIngester

type TrancoIngester struct {
	IngesterConf
}

TrancoIngester represents Ingester implementation for Tranco Top 1 Million websites. About: https://tranco-list.eu/

func NewTranco

func NewTranco() *TrancoIngester

NewTranco bootstraps TrancoIngester

func (*TrancoIngester) Do

func (in *TrancoIngester) Do() (chan *Entry, error)

Do implements Ingester Do func with the data from Tranco Top 1M CSV file

type UmbrellaIngester

type UmbrellaIngester struct {
	IngesterConf
}

UmbrellaIngester represents Ingester implementation for OpenDNS Umbrella Top 1M domains

More info: https://umbrella.cisco.com/blog/2016/12/14/cisco-umbrella-1-million/

func NewUmbrella

func NewUmbrella() *UmbrellaIngester

NewUmbrella bootstraps UmbrellaIngester

func (*UmbrellaIngester) Do

func (in *UmbrellaIngester) Do() (chan *Entry, error)

Do implements Ingester Do func with the data from OpenDNS

type YandexRadarIngester added in v2.5.0

type YandexRadarIngester struct {
	IngesterConf
}

YandexRadarIngester represents Ingester implementation for Yandex Radar.

func NewYandexRadar added in v2.5.0

func NewYandexRadar() *YandexRadarIngester

NewYandexRadar bootstraps YandexRadarIngester.

func (*YandexRadarIngester) Do added in v2.5.0

func (in *YandexRadarIngester) Do() (chan *Entry, error)

Do implements Ingester Do func with the data.

Directories

Path Synopsis
cmd
service

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL