adstxt

package module
v0.0.0-...-59b7449 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 22, 2020 License: MIT Imports: 14 Imported by: 0

README

go-adstxt-crawler

Ads.txt crawler and parser based on IAB Ads.txt Specification Version 1.0.1 implemented in Go

This library provides a mechanism for obtaining and parsing Ads.txt file from websites, or parse your local copy of Ads.txt file

Motivation

There are some nice online tools for crawling and validating Ads.txt files (for example Ads.txt validator from AppNexus or another Ads.txt Validator by AdReform) that follows IAB Ads.txt Specification Version 1.0.1. However, you cannot easily use those tools for massive site scanning since they do not provide free API.

There are also few open source projects I've found Github for scanning Ads.txt files, but at least the ones that I've tried were not fully competible with latest Ads.txt Spec. You can use them, of course, and they would do a decent job, but they are lacking a good validation mechanism to ensure that Ads.txt format is correct and follows the official spec.

This Ads.txt library allows massive sites crawling, and follows IAB Ads.txt Specification Version 1.0.1 to validate that the Ads.txt file is valid

Examples

You can see examples folder for a short example of adstxt library 3 main methods: adstxt.Get to fetch and parse single Ads.txt file from a remote host, adstxt.GetMultiple to fetch and parse multiple Ads.txt files form different hosts or adstxt.ParseBody that can be used to parse the content of a local Ads.txt file

req, err := adstxt.NewRequest("example.com")
if err != nil {
  log.Fatal(err)
}
res, err := adstxt.Get(req)
if err != nil {
  log.Fatal(err)
}
// res now holds Ads.txt file DataRecords, Variables and Warnings for Ads.txt parse warnings
for _, r := range res.DataRecords { ... }
for _, v := range res.Variables { ... }
for _, w := range res.Warnings { ... }

Or get Ads.txt files for multiple hosts simultaneously

// define handler function to handle Ads.txt response
h := func(req *Request, res *Response, err error) {
  for _, r := range res.DataRecords { ... }
  for _, v := range res.Variables { ... }
  for _, w := range res.Warnings { ... }
}

// collection of domains to validate
domains := []string{
  "http://example.com",
  "http://test.com",
}

requests := make([]*adstxt.Request, len(domains))
for index, d := range domains {
  r, _ := adstxt.NewRequest(d)
  requests[index] = r
}

adstxt.GetMultiple(requests, adstxt.HandlerFunc(h))

You can also parse local Ads.txt file in a similar way

body, err := ioutil.ReadFile("/<path_to>/ads.txt")
if err != nil {
  log.Fatal(err)
}
rec, err := adstxt.ParseBody(body)
if err != nil {
  log.Fatal(err)
}
// rec now holds Ads.txt file DataRecords, Variables and Warnings for Ads.txt parse warnings
for _, r := range rec.DataRecords { ... }
for _, v := range rec.Variables { ... }
for _, w := range rec.Warnings { ... } 

Import as a Library

import "github.com/tzafrirben/go-adstxt-crawler/adstxt" and you can use adstxt library in your code

ToDo

  • robots.txt file on remote host is ignored by crawler, a good practice will be to scan this file first (as specified in Ads.txt specification)

LICENSE

MIT

Author

Tzafrir Ben Ami (a.k.a. tzafrirben)

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetMultiple

func GetMultiple(req []*Request, h Handler)

GetMultiple crawl and parse multiple Ads.txt files from remote hosts based on Ads.txt Specification Version 1.0.1 https://iabtechlab.com/wp-content/uploads/2017/09/IABOpenRTB_Ads.txt_Public_Spec_V1-0-1.pdf

Types

type DataRecord

type DataRecord struct {
	AdverterDomain     string `json:"adverterdomain"`            // AdverterDomain Domain name of the advertising system (required)
	PublisherAccountID string `json:"publisheraccountid"`        // PublisherAccountID the identifier associated with the seller (required)
	AccountType        string `json:"accountype"`                // AccountType enumeration of the type of account: DIRECT or RESELLER (required)
	CertAuthorityID    string `json:"certauthorityid,omitempty"` // CertAuthorityID An ID that uniquely identifies the advertising system within a certification authority (optional)
}

DataRecord hold single Ads.txt data record

type Handler

type Handler interface {
	Handle(*Request, *Response, error)
}

The Handler interface is used to process Ads.txt requests. It is similar to the net/http.Handler interface.

type HandlerFunc

type HandlerFunc func(*Request, *Response, error)

A HandlerFunc is a function signature that implements the Handler interface. A function with this signature can thus be used as a Handler.

func (HandlerFunc) Handle

func (h HandlerFunc) Handle(req *Request, res *Response, err error)

Handle is the Handler interface implementation for the HandlerFunc type.

type Records

type Records struct {
	DataRecords []*DataRecord `json:"dataRecords"`
	Variables   []*Variable   `json:"variables"`
	Warnings    []*Warning    `json:"warnings"`
	Body        []string      `json:"body"` // Original Ads.txt file content
}

Records holds collection of Ads.txt records parsed from an Ads.txt file, in addition to errors found during Ads.txt file parsing

func ParseBody

func ParseBody(b []byte) (*Records, error)

ParseBody parse Ads.txt file based on Ads.txt Specification Version 1.0.1 https://iabtechlab.com/wp-content/uploads/2017/09/IABOpenRTB_Ads.txt_Public_Spec_V1-0-1.pdf

func (*Records) String

func (r *Records) String() string

custom "toString" method

type Request

type Request struct {
	Domain string `json:"domain"` // Domain holds the root domain of the remote host
	URL    string `json:"url"`    // URL of the Ads.txt file to fetch
}

Request to fetch Ads.txt file from remote host

func NewRequest

func NewRequest(rawurl string) (*Request, error)

NewRequest create new Ads.txt file request from remote host

type Response

type Response struct {
	*Request
	*Records
	Expires time.Time `json:"expires"` // Ads.txt file expiration date
}

Response to an Ads.txt request: collection of Data\Variable records parsed from Ads.txt file and file expiration date

func Get

func Get(req *Request) (*Response, error)

Get crawl and parse Ads.txt file from remote host based on Ads.txt Specification Version 1.0.1 https://iabtechlab.com/wp-content/uploads/2017/09/IABOpenRTB_Ads.txt_Public_Spec_V1-0-1.pdf

type Severity

type Severity int

Severity of parse warning (low for moderate warning, high indicates potential error)

const (

	// LowSeverity severity level for parse warning (low)
	LowSeverity Severity = 1 + iota // Warning indicate
	// HighSeverity severity level for parse warning (high, indicates possible error)
	HighSeverity
)

type Variable

type Variable struct {
	Type  string `json:"type"`  // Type of variable record. Supported types are subdomain and contact
	Value string `json:"value"` // Value of variable record
}

Variable hold single of Ads.txt variable record

type Warning

type Warning struct {
	Index   int      `json:"index"` // Index of the line in the Ads.txt file in which warning was found
	Text    string   `json:"txt"`   // Text of the line in the Ads.txt file in which warning was found
	Message string   `json:"msg"`   // Warning reason
	Level   Severity `json:"level"` // Severity level of parse warning
}

Warning represent failure to parse Ads.txt line according to official ads.txt spec

Directories

Path Synopsis
examples

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL