choir

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 26, 2020 License: Apache-2.0 Imports: 13 Imported by: 0

README

Choir: Privacy-first framework for domain-oriented error reporting

Overview

Choir is a Go library for reporting errors from apps without revealing the user's IP address to the developer. It works by sending error reports through the user's DNS server, which acts as an anonymizing forwarder. Choir is most appropriate when the error reports represent failure to connect to a particular domain.

The name "Choir" refers to hearing from many users without being able to identify them individually.

Background and motivation

It is now common for networked client software to report performance and reliability problems to the developer. This information can help the developer to troubleshoot problems encountered by users, and prioritize the most important improvements. However, typical reporting systems require the user to place a significant amount of trust in the developer and their metrics service. Even reporting frameworks that are described as "anonymous" often reveal the client's IP address to the service operator, requiring the client to trust that the operator is not storing or analyzing this information inappropriately.

To reduce the degree of trust they ask of their users, developers of privacy-oriented client software sometimes forgo error reporting entirely. This avoids possession of potentially sensitive user data, but it also prevents developers from identifying and understanding issues with their products, resulting in a worse user experience.

In network security software, reports often contain a domain name along with some performance or reliability information about the connection to that domain. In many cases, the domain name is the only sensitive aspect of the report, insofar as it may indicate a user action. However, this information is necessarily already known to the user's DNS server (i.e. recursive resolver), which the client used to bootstrap the connection.

Choir enables developers to receive domain-keyed reports without learning the client's IP address, by sending reports through the client's DNS server.

Domain names and k-anonymity

If we suppose that the software developer can receive reports from a user without learning their IP address, there is still a potential concern related to domain names that are only accessed by a very small number of users, especially if the developer would like to share aggregated data with third parties. For example, a single report for a domain like "firstname.lastname.bloghost.example" might indicate that the person by this name is keeping a secret diary. If the developer shares a list of domains that includes this name, the existence of the secret diary is revealed.

To help developers protect the privacy of their users when sharing aggregated information, we need to provide a reliable way for the developers to determine whether a domain was used by many different users. Counting the number of reports is not sufficient: a single user could have produced multiple reports, and reports can be duplicated by DNS intermediaries.

Choir is designed to give developers a lower bound on the number of distinct users reporting a particular domain, while also ensuring that multiple reports cannot be tied to a single user.

Design

Choir is a highly opinionated reporting library, designed to guide developers toward good privacy practices. Each Choir report consists of a key and zero or more values. The key is a tuple of (domain name, country code, UTC date). The values are arbitrary strings.

Choir assigns each outgoing report to one of a configurable number of bins. The assignment is random but stable for each (key, user) pair, based on a hash with a secret, static salt. If the developer observes the same key in k different bins, they can be confident that it was reported by at least k distinct users. Otherwise, developers should not share this domain with third parties. The date is included in the key to ensure that bin assignments cannot be used to track a user over multiple days.

Choir reports are converted into domain names in the following format

value0.value1.value2.bin.us.20191218.www.example.com.metrics.mydomain.example

The Choir library guides applications to

  • limit reports to one per domain per day,
  • construct reports with appropriate pseudorandom bin assignments,
  • convert reports into domain names,
  • format queries for these names with client subnet reporting disabled, and
  • rate-limit queries to avoid sending correlated reports.

For the metrics server, Choir provides tools for

  • parsing reports from domain names, and
  • applying k-anonymity filtering to the received reports.
Rate limiting

Choir imposes two kinds of rate limits to minimize information leakage. First, each domain can only be the subject of one report per day. Without this limitation, if a single client produced multiple reports for the same domain, they would be assigned to the same bin, potentially allowing the developer to link the reports together.

Secondly, when a burst of reports are filed in a short interval (the "burst duration"), Choir will select one at random to report and discard the rest. This avoids reporting patterns of domains that could reveal additional information about user activity, such as a specific webpage that they were visiting.

To avoid creating persistent state that records user activity, these limits are implemented purely in-memory.

Implementation

Each report to the metrics server is represented as a Report, which has a Key (consisting of the domain, country, and date) and a slice of string values. To create and send reports, clients first instantiate a long-lived Reporter, which is configured with the number of values, number of bins, user country, burst duration, and a callback to use for sending queries.

Servers use a Receiver to parse incoming DNS queries into Reports, and to apply k-anonymity filtering to those reports based on the number of bins.

Implementation

Example

This diagram shows an example of using Choir to report connection errors, with a single value indicating the type of error.

Example

Advice and Warnings

  • The values have not previously been revealed to the recursive resolver, so developers must be confident that they are non-sensitive. To give users confidence that Choir is being used responsibly, developers are encouraged to make values human-readable or extremely compact. Each value must be lowercase ASCII and short enough to fit in a DNS label.
  • The salt must be preserved as long as possible on the client. Changes to the salt could cause a user to be double-counted, undermining the k-anonymity guarantee.
  • Developers can configure the number of bins. A larger number of bins allows the server to enforce a larger anonymity threshold, but also makes repeated reports from a single user during a single day easier to link if duplicate detection fails.
  • Developers are encouraged to set a burst duration of at least five seconds, to cover the load duration of a typical webpage.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Filter

func Filter(in <-chan Report, threshold int) <-chan Report

Filter accepts a channel of reports (e.g. all the reports arriving at the metrics server) and delivers them to the output channel only if enough arrive to provide k-anonymity at the desired threshold. Callers should close the input channel when finished, to allow garbage-collection of any pending reports.

func FormatQuery

func FormatQuery(report Report, suffix string) ([]byte, error)

FormatQuery returns a fully serialized DNS query for a TXT record at a name that encodes `report`, as a subdomain of `suffix`. The query includes an instruction to the recursive resolver not to reveal any information about the client's IP address to the authoritative server using the EDNS Client Subnet extension, as described in https://tools.ietf.org/html/rfc7871#section-7.1.2.

Types

type Key

type Key struct {
	Domain  string
	Country string
	Date    time.Time
}

Key is the Quasi-Identifying information associated with a report. It is protected by k-anonymity when using bin count filtering.

type Receiver

type Receiver struct {
	// The name of the metrics server, e.g. "metrics.example.com"
	Suffix string
	// The number of values in each Report.
	Values int
}

Receiver represents the configuration of a metrics server, required to receive `Report`s in query form.

func (*Receiver) ParseReport

func (r *Receiver) ParseReport(name string) (*Report, error)

ParseReport inverts Reporter.name(report)

type Report

type Report struct {
	Key
	// Each report contains a zero or more values.
	// These values should not contain any potentially identifying information,
	// because they are revealed to the recursive resolver and are not protected
	// by k-anonymity.  A single user can make multiple reports with the same
	// or different values, but only one report will be sent for each Key.
	Values []Value
	// contains filtered or unexported fields
}

Report represents a full report to the server.

type ReportSender

type ReportSender interface {
	// Send is required to be safe for concurrent execution.
	Send(Report) error
}

ReportSender is a general interface for sending a Report to a metrics server.

type Reporter

type Reporter interface {
	// Report the provided values for this domain.
	Report(domain string, values ...Value) error
}

Reporter wraps values into queries and sends them to a metrics server.

func NewReporter

func NewReporter(file io.ReadWriter, bins, values int, country string, burst time.Duration, sender ReportSender) (Reporter, error)

NewReporter returns a reporter that uses the salt in `file` (which may initially be empty) to assign reports with this many `values` to one of the specified number of `bins` for the client's `country`. Bursts of reports are accumulated for the specified duration, and one report from each burst is passed asynchronously to `sender` as a Report ready to send.

type Value

type Value struct {
	// contains filtered or unexported fields
}

Value represents a string that has been validated as correctly formatted for inclusion in a Report. A correctly formatted Value is a string of length 63 or less that does not contain a '.', upper-case characters, or any characters beyond basic ASCII. These restrictions ensure that a Value can be passed through the DNS as a label without data loss.

func NewValue

func NewValue(v string) (Value, error)

NewValue converts `v` to a Value, or returns an error if `v` is not a valid value.

func (Value) String

func (v Value) String() string

Directories

Path Synopsis
This sample demonstrates use of Choir in a client application.
This sample demonstrates use of Choir in a client application.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL