dataframe

package
v0.0.0-...-6fa1a96 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 7, 2025 License: BSD-3-Clause Imports: 12 Imported by: 5

Documentation

Overview

Package dataframe provides DataFrame which is a TraceSet with a calculated ParamSet and associated commit info.

Index

Constants

View Source
const (
	// DEFAULT_NUM_COMMITS is the number of commits in the DataFrame returned
	// from New().
	DEFAULT_NUM_COMMITS = 50

	MAX_SAMPLE_SIZE = 5000
)

Variables

This section is empty.

Functions

This section is empty.

Types

type ColumnHeader

type ColumnHeader struct {
	Offset    types.CommitNumber `json:"offset"`
	Timestamp TimestampSeconds   `json:"timestamp"`
}

ColumnHeader describes each column in a DataFrame.

func FromTimeRange

func FromTimeRange(ctx context.Context, git perfgit.Git, begin, end time.Time, downsample bool) ([]*ColumnHeader, []types.CommitNumber, int, error)

FromTimeRange returns the slices of ColumnHeader and int32. The slices are for the commits that fall in the given time range [begin, end).

If 'downsample' is true then the number of commits returned is limited to MAX_SAMPLE_SIZE. TODO(jcgregorio) Remove downsample, it is currently ignored. The value for 'skip', the number of commits skipped, is also returned.

func MergeColumnHeaders

func MergeColumnHeaders(a, b []*ColumnHeader) ([]*ColumnHeader, map[int]int, map[int]int)

MergeColumnHeaders creates a merged header from the two given headers.

I.e. {1,4,5} + {3,4} => {1,3,4,5}

type DataFrame

type DataFrame struct {
	TraceSet types.TraceSet              `json:"traceset"`
	Header   []*ColumnHeader             `json:"header"`
	ParamSet paramtools.ReadOnlyParamSet `json:"paramset"`
	Skip     int                         `json:"skip"`
}

DataFrame stores Perf measurements in a table where each row is a Trace indexed by a structured key (see go/query), and each column is described by a ColumnHeader, which could be a commit or a trybot patch level.

Skip is the number of commits skipped to bring the DataFrame down to less than MAX_SAMPLE_SIZE commits. If Skip is zero then no commits were skipped.

The name DataFrame was gratuitously borrowed from R.

func Join

func Join(a, b *DataFrame) *DataFrame

Join create a new DataFrame that is the union of 'a' and 'b'.

Will handle the case of a and b having data for different sets of commits, i.e. a.Header doesn't have to equal b.Header.

func NewEmpty

func NewEmpty() *DataFrame

NewEmpty returns a new empty DataFrame.

func NewHeaderOnly

func NewHeaderOnly(ctx context.Context, git perfgit.Git, begin, end time.Time, downsample bool) (*DataFrame, error)

NewHeaderOnly returns a DataFrame with a populated Header, with no traces. The 'progress' callback is called periodically as the query is processed.

If 'downsample' is true then the number of commits returned is limited to MAX_SAMPLE_SIZE.

func (*DataFrame) BuildParamSet

func (d *DataFrame) BuildParamSet()

BuildParamSet rebuilds d.ParamSet from the keys of d.TraceSet.

func (*DataFrame) Compress

func (d *DataFrame) Compress() *DataFrame

Compress returns a DataFrame with all columns that don't contain any data removed. If the DataFrame is already fully compressed then the original DataFrame is returned.

func (*DataFrame) FilterOut

func (d *DataFrame) FilterOut(f TraceFilter)

FilterOut removes traces from d.TraceSet if the filter function 'f' returns true for a trace.

FilterOut rebuilds the ParamSet to match the new set of traces once filtering is complete.

func (*DataFrame) Slice

func (d *DataFrame) Slice(offset, size int) (*DataFrame, error)

Slice returns a dataframe that contains a subset of the current dataframe, starting from 'offset', the next 'size' num points will be returned as a new dataframe. Note that the data is composed of slices of the original data, not copies, so the returned dataframe must not be altered.

type DataFrameBuilder

type DataFrameBuilder interface {
	// NewFromQueryAndRange returns a populated DataFrame of the traces that match
	// the given time range [begin, end) and the passed in query, or a non-nil
	// error if the traces can't be retrieved. The 'progress' callback is called
	// periodically as the query is processed.
	NewFromQueryAndRange(ctx context.Context, begin, end time.Time, q *query.Query, downsample bool, progress progress.Progress) (*DataFrame, error)

	// NewFromKeysAndRange returns a populated DataFrame of the traces that match
	// the given set of 'keys' over the range of [begin, end). The 'progress'
	// callback is called periodically as the query is processed.
	NewFromKeysAndRange(ctx context.Context, keys []string, begin, end time.Time, downsample bool, progress progress.Progress) (*DataFrame, error)

	// NewNFromQuery returns a populated DataFrame of condensed traces of N data
	// points ending at the given 'end' time that match the given query.
	NewNFromQuery(ctx context.Context, end time.Time, q *query.Query, n int32, progress progress.Progress) (*DataFrame, error)

	// NewNFromQuery returns a populated DataFrame of condensed traces of N data
	// points ending at the given 'end' time for the given keys.
	NewNFromKeys(ctx context.Context, end time.Time, keys []string, n int32, progress progress.Progress) (*DataFrame, error)

	// NumMatches returns the number of traces that will match the query.
	NumMatches(ctx context.Context, q *query.Query) (int64, error)

	// PreflightQuery returns the number of traces that will match the query and
	// a refined ParamSet to use for further queries. The referenceParamSet
	// should be a ParamSet that includes all the Params that could appear in a
	// query. For example, the ParamSet managed by ParamSetRefresher.
	PreflightQuery(ctx context.Context, q *query.Query, referenceParamSet paramtools.ReadOnlyParamSet) (int64, paramtools.ParamSet, error)
}

DataFrameBuilder is an interface for things that construct DataFrames.

type TimestampSeconds

type TimestampSeconds int64

TimestampSeconds represents a timestamp in seconds from the Unix epoch.

type TraceFilter

type TraceFilter func(tr types.Trace) bool

TraceFilter is a function type that should return true if trace 'tr' should be removed from a DataFrame. It is used in FilterOut.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL