pirsch

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 3, 2020 License: MIT Imports: 17 Imported by: 0

README

Pirsch

GoDoc Go Report Card

State of the project: we are currently testing how precise Pirsch is by comparing it to other solutions.

Pirsch is a server side, no-cookie, drop-in and privacy focused tracking solution for Go. Integrated into a Go application it enables you to track HTTP traffic without invading the privacy of your visitors. The visualization of the data must be implemented by yourself. We might add a UI for Pirsch in the future.

The name is in German and refers to a special kind of hunt: the hunter carefully and quietly enters the area to be hunted, he stalks against the wind in order to get as close as possible to the prey without being noticed.

How does it work?

Pirsch generates a unique fingerprint for each visitor. The fingerprint is a hash of the visitors IP, User-Agent and a salt. The salt is re-generated at midnight to separate data for each day.

Each time a visitor opens your page, Pirsch will store a hit. The hits are analyzed later to extract meaningful data and reduce storage usage.

This all works without invading the visitors privacy, as no cookies are used and individual users cannot be tracked, as the fingerprint does anonymize the data points. At the same time Pirsch can track visitors using blockers that otherwise would block tracking. uBlock blocks Google Analytics for example.

Features and limitations

Pirsch tracks the following data points at the moment:

  • total visitors per day
  • visitors per day and hour
  • visitors per day and page
  • visitors per day and language

All timestamps are stored as UTC.

It's theoretically possible to track the visitor flow (which page was seen first, which one was opened next and so one), but this is not implemented at the moment. Here is a list of the limitations of Pirsch:

  • track sessions, as the salt will prevent you from tracking a user across two days
  • bots might not always be filtered out
  • rare cases where two fingerprints collide, if two visitors are behind the same proxy for example and the User-Agent is the same (or empty)

Usage

To store hits and statistics, Pirsch uses a database. Right now only Postgres is supported, but new ones can easily be integrated by implementing the Store interface. The schema can be found within the schema directory. Changes will be added to migrations scripts, so that you can add them to your projects database migration or run them manually.

Here is a quick demo on how to use the library:

// create a Postgres store using the sql.DB database connection "db"
store := pirsch.NewPostgresStore(db)

// Tracker is the main component of Pirsch
// the salt is used to prevent anyone from generating fingerprints like yours (to prevent man in the middle attacks), pick something random
// an optional configuration can be used to change things like worker count, timeouts and so on
tracker := pirsch.NewTracker(store, "secret_salt", nil)

// the Processor analyzes hits and stores the reduced data points in store
// it's recommended to run the Process method once a day
processor := pirsch.NewProcessor(store)
pirsch.RunAtMidnight(processor.Process) // helper function to run a function at midnight

http.Handle("/", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
    // a call to Hit will track the request
    // note that Pirsch stores the path and URL, therefor you should make sure you only call it for the endpoints you're interersted in
    if r.URL.Path == "/" {
        tracker.Hit(r)
    }

    w.Write([]byte("<h1>Hello World!</h1>"))
}))

Instead of calling Hit, you can also call HitPage, which allows you to specify an alternative path independent of the one provided in the request. That can be used to implement a single tracking endpoint which you call using an Ajax request providing the path of the current page.

To analyze hits and processed data you can use the analyzer, which provides some functions to extract useful data.

The secret salt passed to NewTracker should not be known outside of your organization, as it can be used to generate fingerprints that are like yours. This is used to prevent people from outside your organization to track your visitors and gain data from it. Note that while you can generate the salt at random, the fingerprints will change too. To get reliable data configure a fixed salt and treat it like a password.

// this also needs access to the store
analyzer := pirsch.NewAnalyzer(store)

// as an example, lets extract the total number of visitors
// the filter is used to specify the time frame you're looking at (days) and is optional
// if you pass nil, the Analyzer returns data for the past week including today
visitors, err := analyzer.Visitors(&pirsch.Filter{
    From: yesterday(),
    To: today()
})

Read the full documentation for more details or check out this article.

Changelog

1.1.0
  • added a secret salt to prevent generating fingerprints to identify visitors on other websites (man in the middle)
  • extended bot list
1.0.0

Initial release.

Contribution

All contributions are welcome! You can extend the bot list or processor for example, to extract more useful data. Please open a pull requests for your changes and tickets in case you would like to discuss something or have a question.

To run the tests you'll need a Postgres database and a schema called pirsch. The user and password is set to postgres. To add another data store, the Store interface must be implemented. Pirsch makes heavy use of SQL to aggregate and analyze data.

License

MIT

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Fingerprint

func Fingerprint(r *http.Request, salt string) string

Fingerprint returns a hash for given request and salt. The hash is unique for the visitor, not for the page.

func RunAtMidnight

func RunAtMidnight(f func())

RunAtMidnight calls given function on each day of month on midnight.

Types

type Analyzer

type Analyzer struct {
	// contains filtered or unexported fields
}

Analyzer provides an interface to analyze processed data and hits.

func NewAnalyzer

func NewAnalyzer(store Store) *Analyzer

NewAnalyzer returns a new Analyzer for given Store.

func (*Analyzer) ActiveVisitors

func (analyzer *Analyzer) ActiveVisitors(d time.Duration) (int, error)

ActiveVisitors returns unique visitors last active within given duration.

func (*Analyzer) HourlyVisitors

func (analyzer *Analyzer) HourlyVisitors(filter *Filter) ([]HourlyVisitors, error)

HourlyVisitors returns the absolute and relative visitor count per language for given time frame.

func (*Analyzer) Languages

func (analyzer *Analyzer) Languages(filter *Filter) ([]VisitorLanguage, int, error)

Languages returns the absolute and relative visitor count per language for given time frame.

func (*Analyzer) PageVisits

func (analyzer *Analyzer) PageVisits(filter *Filter) ([]PageVisits, error)

PageVisits returns the visitors per page per day for given time frame.

func (*Analyzer) Visitors

func (analyzer *Analyzer) Visitors(filter *Filter) ([]VisitorsPerDay, error)

Visitors returns the visitors per day for the given time frame.

type Filter

type Filter struct {
	From time.Time
	To   time.Time
}

Filter is used to specify the time frame for the Analyzer.

func (*Filter) Days

func (filter *Filter) Days() int

Days returns the number of days covered by the filter.

type Hit

type Hit struct {
	ID          int64     `db:"id" json:"id"`
	Fingerprint string    `db:"fingerprint" json:"fingerprint"`
	Path        string    `db:"path" json:"path,omitempty"`
	URL         string    `db:"url" json:"url,omitempty"`
	Language    string    `db:"language" json:"language,omitempty"`
	UserAgent   string    `db:"user_agent" json:"user_agent,omitempty"`
	Ref         string    `db:"ref" json:"ref,omitempty"`
	Time        time.Time `db:"time" json:"time"`
}

Hit represents a single data point/page visit.

func (Hit) String

func (hit Hit) String() string

String implements the Stringer interface.

type HourlyVisitors

type HourlyVisitors struct {
	Hour     int `db:"hour" json:"hour"`
	Visitors int `db:"visitors" json:"visitors"`
}

HourlyVisitors is the unique visitor count per hour.

type PageVisits

type PageVisits struct {
	Path   string
	Visits []VisitorsPerDay
}

PageVisits is the visitor count per day for each path.

type PostgresStore

type PostgresStore struct {
	DB *sqlx.DB
}

PostgresStore implements the Store interface.

func NewPostgresStore

func NewPostgresStore(db *sql.DB) *PostgresStore

NewPostgresStore creates a new postgres storage for given database connection.

func (*PostgresStore) ActiveVisitors

func (store *PostgresStore) ActiveVisitors(from time.Time) (int, error)

ActiveVisitors implements the Store interface.

func (*PostgresStore) Days

func (store *PostgresStore) Days() ([]time.Time, error)

Days implements the Store interface.

func (*PostgresStore) DeleteHitsByDay

func (store *PostgresStore) DeleteHitsByDay(day time.Time) error

DeleteHitsByDay implements the Store interface.

func (*PostgresStore) HourlyVisitors

func (store *PostgresStore) HourlyVisitors(from, to time.Time) ([]HourlyVisitors, error)

VisitorLanguages implements the Store interface.

func (*PostgresStore) PageVisits

func (store *PostgresStore) PageVisits(path string, from, to time.Time) ([]VisitorsPerDay, error)

PageVisits implements the Store interface.

func (*PostgresStore) Paths

func (store *PostgresStore) Paths(from, to time.Time) ([]string, error)

Paths implements the Store interface.

func (*PostgresStore) Save

func (store *PostgresStore) Save(hits []Hit) error

Save implements the Store interface.

func (*PostgresStore) SaveVisitorsPerDay

func (store *PostgresStore) SaveVisitorsPerDay(visitors *VisitorsPerDay) error

SaveVisitorsPerDay implements the Store interface.

func (*PostgresStore) SaveVisitorsPerHour

func (store *PostgresStore) SaveVisitorsPerHour(visitors *VisitorsPerHour) error

SaveVisitorsPerHour implements the Store interface.

func (*PostgresStore) SaveVisitorsPerLanguage

func (store *PostgresStore) SaveVisitorsPerLanguage(visitors *VisitorsPerLanguage) error

SaveVisitorsPerLanguage implements the Store interface.

func (*PostgresStore) SaveVisitorsPerPage

func (store *PostgresStore) SaveVisitorsPerPage(visitors *VisitorsPerPage) error

SaveVisitorsPerPage implements the Store interface.

func (*PostgresStore) VisitorLanguages

func (store *PostgresStore) VisitorLanguages(from, to time.Time) ([]VisitorLanguage, error)

VisitorLanguages implements the Store interface.

func (*PostgresStore) Visitors

func (store *PostgresStore) Visitors(from, to time.Time) ([]VisitorsPerDay, error)

Visitors implements the Store interface.

func (*PostgresStore) VisitorsPerDay

func (store *PostgresStore) VisitorsPerDay(day time.Time) (int, error)

VisitorsPerDay implements the Store interface.

func (*PostgresStore) VisitorsPerDayAndHour

func (store *PostgresStore) VisitorsPerDayAndHour(day time.Time) ([]VisitorsPerHour, error)

VisitorsPerDayAndHour implements the Store interface.

func (*PostgresStore) VisitorsPerLanguage

func (store *PostgresStore) VisitorsPerLanguage(day time.Time) ([]VisitorsPerLanguage, error)

VisitorsPerLanguage implements the Store interface.

func (*PostgresStore) VisitorsPerPage

func (store *PostgresStore) VisitorsPerPage(day time.Time) ([]VisitorsPerPage, error)

VisitorsPerPage implements the Store interface.

type Processor

type Processor struct {
	// contains filtered or unexported fields
}

Processor processes hits to reduce them into meaningful statistics.

func NewProcessor

func NewProcessor(store Store) *Processor

NewProcessor creates a new Processor for given Store.

func (*Processor) Process

func (processor *Processor) Process()

Process processes all hits in database and deletes them afterwards. It will panic in case of an error.

type Store

type Store interface {
	// Save persists a list of hits.
	Save([]Hit) error

	// DeleteHitsByDay deletes all hits on given day.
	DeleteHitsByDay(time.Time) error

	// SaveVisitorsPerDay persists unique visitors per day.
	SaveVisitorsPerDay(*VisitorsPerDay) error

	// SaveVisitorsPerHour persists unique visitors per day and hour.
	SaveVisitorsPerHour(*VisitorsPerHour) error

	// SaveVisitorsPerLanguage persists unique visitors per day and language.
	SaveVisitorsPerLanguage(*VisitorsPerLanguage) error

	// SaveVisitorsPerPage persists unique visitors per day and page.
	SaveVisitorsPerPage(*VisitorsPerPage) error

	// Days returns the days at least one hit exists for.
	Days() ([]time.Time, error)

	// VisitorsPerDay returns the unique visitor count for per day.
	VisitorsPerDay(time.Time) (int, error)

	// VisitorsPerHour returns the unique visitor count per day and hour.
	VisitorsPerDayAndHour(time.Time) ([]VisitorsPerHour, error)

	// VisitorsPerLanguage returns the unique visitor count per language and day.
	VisitorsPerLanguage(time.Time) ([]VisitorsPerLanguage, error)

	// VisitorsPerPage returns the unique visitor count per page and day.
	VisitorsPerPage(time.Time) ([]VisitorsPerPage, error)

	// Paths returns distinct paths for page visits.
	// This does not include today.
	Paths(time.Time, time.Time) ([]string, error)

	// Visitors returns the visitors within given time frame.
	// This does not include today.
	Visitors(time.Time, time.Time) ([]VisitorsPerDay, error)

	// PageVisits returns the page visits within given time frame for given path.
	// This does not include today.
	PageVisits(string, time.Time, time.Time) ([]VisitorsPerDay, error)

	// VisitorLanguages returns the languages within given time frame for unique visitors.
	// It does include today.
	VisitorLanguages(time.Time, time.Time) ([]VisitorLanguage, error)

	// HourlyVisitors returns unique visitors per hour for given time frame.
	// It does include today.
	HourlyVisitors(time.Time, time.Time) ([]HourlyVisitors, error)

	// ActiveVisitors returns unique visitors starting at given time.
	ActiveVisitors(time.Time) (int, error)
}

Store defines an interface to persists hits and other data.

type Tracker

type Tracker struct {
	// contains filtered or unexported fields
}

Tracker is the main component of Pirsch. It provides methods to track requests and store them in a data store. In case of an error it will panic.

func NewTracker

func NewTracker(store Store, salt string, config *TrackerConfig) *Tracker

NewTracker creates a new tracker for given store, salt and config. Pass nil for the config to use the defaults. The salt is mandatory.

func (*Tracker) Flush

func (tracker *Tracker) Flush()

Flush flushes all hits to store.

func (*Tracker) Hit

func (tracker *Tracker) Hit(r *http.Request)

Hit stores the given request. The request might be ignored if it meets certain conditions. The actions performed within this function run in their own goroutine, so you don't need to create one yourself.

func (*Tracker) HitPage

func (tracker *Tracker) HitPage(r *http.Request, path string)

HitPage works like Hit, but sets the path to the given path. This can be useful in case you have a single endpoint to track requests that you call from JavaScript for example.

func (*Tracker) Stop

func (tracker *Tracker) Stop()

Stop flushes and stops all workers.

type TrackerConfig

type TrackerConfig struct {
	// Worker sets the number of workers that are used to store hits.
	// Must be greater or equal to 1.
	Worker int

	// WorkerBufferSize is the size of the buffer used to store hits.
	// Must be greater or equal to 2. The hits are stored when the buffer size reaches half of its maximum.
	WorkerBufferSize int

	// WorkerTimeout sets the timeout used to store hits.
	// This is used to allow the workers to store hits even if the buffer is not full.
	// It's recommended to set this to a few seconds.
	WorkerTimeout time.Duration
}

TrackerConfig is the optional configuration for the Tracker.

type VisitorLanguage

type VisitorLanguage struct {
	Language         string  `db:"language" json:"language"`
	Visitors         int     `db:"visitors" json:"visitors"`
	RelativeVisitors float64 `db:"-" json:"relative_visitors"`
}

VisitorLanguage is the unique visitor count per language.

type VisitorsPerDay

type VisitorsPerDay struct {
	ID       int64     `db:"id" json:"id"`
	Day      time.Time `db:"day" json:"day"`
	Visitors int       `db:"visitors" json:"visitors"`
}

VisitorsPerDay is the unique visitor count per day.

type VisitorsPerHour

type VisitorsPerHour struct {
	ID         int64     `db:"id" json:"id"`
	DayAndHour time.Time `db:"day_and_hour" json:"day_and_hour"`
	Visitors   int       `db:"visitors" json:"visitors"`
}

VisitorsPerHour is the unique visitor count per hour and day.

type VisitorsPerLanguage

type VisitorsPerLanguage struct {
	ID       int64     `db:"id" json:"id"`
	Day      time.Time `db:"day" json:"day"`
	Language string    `db:"language" json:"language"`
	Visitors int       `db:"visitors" json:"visitors"`
}

VisitorsPerLanguage is the unique visitor count per language and day.

type VisitorsPerPage

type VisitorsPerPage struct {
	ID       int64     `db:"id" json:"id"`
	Day      time.Time `db:"day" json:"day"`
	Path     string    `db:"path" json:"path"`
	Visitors int       `db:"visitors" json:"visitors"`
}

VisitorsPerPage is the unique visitor count per path and day.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL