stats

package

v0.0.1-alpha Latest Latest Go to latest Published: Feb 28, 2022 License: Apache-2.0 Imports: 14 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

Documentation ¶

Overview ¶

This package provides a set of minimal interfaces which both build on and are by default backed by go-metrics. We wrap go-metrics in order to provide a few pieces of additional functionality and to make sure we don't leak our dependencies to anyone pulling in scoot as a library.

Specifically, we provide the following: - Flexibility to override stat recording and formatting, ex: internal Twitter format. - An interface similar in design to Finagle Metrics - A StatsReceiver object that can be passed down a call tree and scoped to each level. - The ability to specify a time.Duration precision when rendering instruments. - A latched update mechanism which takes snapshots at regular intervals. - A new Latency instrument to more easily record callsite latency. - Pretty printing of instrument output.

Original license: github.com/rcrowley/go-metrics/blob/master/LICENSE

Index ¶

Constants
Variables
func GetDiskUsageKB(dir string) (uint64, error)
func PPrintStats(tag string, statsRegistry StatsRegistry)
func ReportServerRestart(stat StatsReceiver, statName string, startupGaugeSpikeLen time.Duration)
func StartUptimeReporting(stat StatsReceiver, statName string, serverStartGaugeName string, ...)
func StatsOk(tag string, statsRegistry StatsRegistry, t *testing.T, ...) bool
type CapturedRegistry
type Counter
type DirsMonitor
- func NewDirsMonitor(dirs []MonitorDir) *DirsMonitor
- func (dm *DirsMonitor) GetEndSizes()
- func (dm *DirsMonitor) GetStartSizes()
- func (dm *DirsMonitor) RecordSizeStats(stat StatsReceiver)
type Gauge
type GaugeFloat
type Histogram
type HistogramView
type Latency
type MarshalerPretty
type MonitorDir
type Rule
type RuleChecker
type StatsReceiver
- func DefaultStatsReceiver() StatsReceiver
- func NewCustomStatsReceiver(makeRegistry func() StatsRegistry, latched time.Duration) (stat StatsReceiver, cancelFn func())
- func NewLatchedStatsReceiver(latched time.Duration) (stat StatsReceiver, cancelFn func())
- func NilStatsReceiver(scope ...string) StatsReceiver
type StatsRegistry
- func NewFinagleStatsRegistry() StatsRegistry
type StatsTicker
- func NewStatsTicker(dur time.Duration) StatsTicker
type StatsTime
- func DefaultStatsTime() StatsTime
- func DefaultTestTime() StatsTime
- func NewTestTime(now time.Time, since time.Duration, ch <-chan time.Time) StatsTime

Constants ¶

View Source

const (
	/****************** ClusterManger metrics ***************************/
	/*
		Cluster metrics on Node types
		Available - available or running tasks (not suspended)
		Free - available, not running
		Running - running tasks
		Lost - not responding to status requests
	*/
	ClusterAvailableNodes = "availableNodes"
	ClusterFreeNodes      = "freeNodes"
	ClusterRunningNodes   = "runningNodes"
	ClusterLostNodes      = "lostNodes"

	ClusterNodeUpdateFreqMs = "clusterSetNodeUpdatesFreq_ms"
	ClusterFetchFreqMs      = "clusterFetchFreq_ms"
	ClusterFetchDurationMs  = "clusterFetchDuration_ms"
	ClusterNumFetchedNodes  = "clusterNumFetchedNodes"
	ClusterFetchedError     = "clusterFetchError"

	/************************* Bundlestore metrics **************************/
	/*
		Bundlestore download metrics (Reads/Gets from top-level Bundlestore/Apiserver)
	*/
	BundlestoreDownloadLatency_ms = "downloadLatency_ms"
	BundlestoreDownloadCounter    = "downloadCounter"
	BundlestoreDownloadErrCounter = "downloadErrCounter"
	BundlestoreDownloadOkCounter  = "downloadOkCounter"

	/*
		Bundlestore check metrics (Exists/Heads from top-level Bundlestore/Apiserver)
	*/
	BundlestoreCheckLatency_ms = "checkLatency_ms"
	BundlestoreCheckCounter    = "checkCounter"
	BundlestoreCheckErrCounter = "checkErrCounter"
	BundlestoreCheckOkCounter  = "checkOkCounter"

	/*
		Bundlestore upload metrics (Writes/Puts to top-level Bundlestore/Apiserver)
	*/
	BundlestoreUploadCounter         = "uploadCounter"
	BundlestoreUploadErrCounter      = "uploadErrCounter"
	BundlestoreUploadExistingCounter = "uploadExistingCounter"
	BundlestoreUploadLatency_ms      = "uploadLatency_ms"
	BundlestoreUploadOkCounter       = "uploadOkCounter"

	/*
	   Bundlestore request counters and uptime statistics
	*/
	BundlestoreRequestCounter     = "serveRequestCounter"
	BundlestoreRequestOkCounter   = "serveOkCounter"
	BundlestoreServerStartedGauge = "bundlestoreStartGauge"
	BundlestoreUptime_ms          = "bundlestoreUptimeGauge_ms"

	/************************* Groupcache Metrics ***************************/
	/*
		Groupcache Read metrics
	*/
	GroupcacheReadCounter    = "readCounter"
	GroupcacheReadOkCounter  = "readOkCounter"
	GroupcacheReadLatency_ms = "readLatency_ms"

	/*
		Groupcache Exists metrics
	*/
	GroupcacheExistsCounter   = "existsCounter"
	GroupcachExistsLatency_ms = "existsLatency_ms"
	GroupcacheExistsOkCounter = "existsOkCounter"

	/*
		Groupcache Write metrics
	*/
	GroupcacheWriteCounter    = "writeCounter"
	GroupcacheWriteOkCounter  = "writeOkCounter"
	GroupcacheWriteLatency_ms = "writeLatency_ms"

	/*
		Groupcache Underlying load metrics (cache misses)
	*/
	GroupcacheReadUnderlyingCounter     = "readUnderlyingCounter"
	GroupcacheReadUnderlyingLatency_ms  = "readUnderlyingLatency_ms"
	GroupcacheExistUnderlyingCounter    = "existUnderlyingCounter"
	GroupcacheExistUnderlyingLatency_ms = "existUnderlyingLatency_ms"
	GroupcacheWriteUnderlyingCounter    = "writeUnderlyingCounter"
	GroupcacheWriteUnderlyingLatency_ms = "writeUnderlyingLatency_ms"

	/*
		Groupcache library - per-cache metrics (typical groupcache includes separate "main" and "hot" caches)
	*/
	GroupcacheMainBytesGauge       = "mainBytesGauge"
	GroupcacheMainGetsCounter      = "mainGetsCounter"
	GroupcacheMainHitsCounter      = "mainHitsCounter"
	GroupcacheMainItemsGauge       = "mainItemsGauge"
	GroupcacheMainEvictionsCounter = "mainEvictionsCounter"
	GroupcacheHotBytesGauge        = "hotBytesGauge"
	GroupcacheHotGetsCounter       = "hotGetsCounter"
	GroupcacheHotHitsCounter       = "hotHitsCounter"
	GroupcacheHotItemsGauge        = "hotItemsGauge"
	GroupcacheHotEvictionsCounter  = "hotEvictionsCounter"

	/*
		Groupcache library - per-group metrics (overall metrics for a groupcache on a single Apiserver)
	*/
	GroupcacheGetCounter              = "cacheGetCounter"
	GroupcacheContainCounter          = "cacheContainCounter"
	GroupcachePutCounter              = "cachePutCounter"
	GroupcacheHitCounter              = "cacheHitCounter"
	GroupcacheLoadCounter             = "cacheLoadCounter"
	GroupcacheCheckCounter            = "cacheCheckCounter"
	GroupcacheStoreCounter            = "cacheStoreCounter"
	GroupcacheIncomingRequestsCounter = "cacheIncomingRequestsCounter"
	GroupcacheLocalLoadErrCounter     = "cacheLocalLoadErrCounter"
	GroupcacheLocalLoadCounter        = "cacheLocalLoadCounter"
	GroupcacheLocalCheckErrCounter    = "cacheLocalCheckErrCounter"
	GroupcacheLocalCheckCounter       = "cacheLocalCheckCounter"
	GroupcacheLocalStoreErrCounter    = "cacheLocalStoreErrCounter"
	GroupcacheLocalStoreCounter       = "cacheLocalStoreCounter"
	GroupcachePeerGetsCounter         = "cachePeerGetsCounter"
	GroupcachePeerChecksCounter       = "cachePeerChecksCounter"
	GroupcachePeerPutsCounter         = "cachePeerPutsCounter"
	GroupcachPeerErrCounter           = "cachePeerErrCounter"

	/*
		Groupcache peer pool metrics (maintained by Scoot)
	*/
	GroupcachePeerCountGauge       = "peerCountGauge"
	GroupcachePeerDiscoveryCounter = "peerDiscoveryCounter"

	SchedAcceptedJobsGauge = "schedAcceptedJobsGauge"

	/*
		the number of tasks that have finished (including those that have been killed)
	*/
	SchedCompletedTaskCounter = "completedTaskCounter"

	/*
		The number of times any of the following conditions occurred:
		- the task's command errored while running
		- the platform could not run the task
		- the platform encountered an error reporting the end of the task to saga
	*/
	SchedFailedTaskCounter = "failedTaskCounter"

	/*
		the number of times the processing failed to serialize the workerapi status object
	*/
	SchedFailedTaskSerializeCounter = "failedTaskSerializeCounter"
	/*
		The number of tasks from the inProgress list waiting to start or running.
	*/
	SchedInProgressTasksGauge = "schedInProgressTasksGauge"

	/*
		the number of job requests that have been put on the addJobChannel
	*/
	SchedJobsCounter = "schedJobsCounter"

	/*
		the amount of time it takes to verify a job definition, add it to the job channel and
		return a job id.
	*/
	SchedJobLatency_ms = "schedJobLatency_ms"

	/*
		the number of jobs requests that have been able to be converted from the thrift
		request to the sched.JobDefinition structure.
	*/
	SchedJobRequestsCounter = "schedJobRequestsCounter"

	/*
		the number of async runners still waiting on task completion
	*/
	SchedNumAsyncRunnersGauge = "schedNumAsyncRunnersGauge"

	/*
		the number of jobs with tasks running.  Only reported by requestor
	*/
	SchedNumRunningJobsGauge = "schedNumRunningJobsGauge"

	/*
		the number of running tasks.  Collected at the end of each time through the
		scheduler's job handling loop.
	*/
	SchedNumRunningTasksGauge = "schedNumRunningTasksGauge"

	/*
		the number of tasks waiting to start. (Only reported by requestor)
	*/
	SchedNumWaitingTasksGauge = "schedNumWaitingTasksGauge"

	/*
		the number of active tasks that stopped because they were preempted by the scheduler
	*/
	SchedPreemptedTasksCounter = "preemptedTasksCounter"

	/*
		the number of times the platform retried sending an end saga message
	*/
	SchedRetriedEndSagaCounter = "schedRetriedEndSagaCounter"

	/*
		record the start of the scheduler server
	*/
	SchedServerStartedGauge = "schedStartGauge"

	/*
		the number of tasks that were scheduled to be started on nodes with the last
		run of the scheduling algorithm
	*/
	SchedScheduledTasksCounter = "scheduledTasksCounter"

	/*
		the number of times the server received a job kill request
	*/
	SchedServerJobKillCounter = "jobKillRpmCounter"

	/*
		the amount of time it took to kill a job (from the server)
	*/
	SchedServerJobKillLatency_ms = "jobKillLatency_ms"

	/*
		the number of job status requests the thrift server received
	*/
	SchedServerJobStatusCounter = "jobStatusRpmCounter"

	/*
		the amount of time it took to process a job status request (from the server)
	*/
	SchedServerJobStatusLatency_ms = "jobStatusLatency_ms"

	/*
		the number of job run requests the thrift server received
	*/
	SchedServerRunJobCounter = "runJobRpmCounter"

	/*
		the amount of time it took to process a RunJob request (from the server) and return either an error or job id
	*/
	SchedServerRunJobLatency_ms = "runJobLatency_ms"

	/*
		The amount of time it takes to assign the tasks to nodes
	*/
	SchedTaskAssignmentsLatency_ms = "schedTaskAssignmentsLatency_ms"

	/*
		the number of times the task runner had to retry the task start
	*/
	SchedTaskStartRetries = "taskStartRetries"

	/*
		The length of time the server has been running
	*/
	SchedUptime_ms = "schedUptimeGauge_ms"

	/*
		The number of jobs waiting to start in the inProgress list at the end of each time through the
		scheduler's job handling loop.  (No tasks in this job have been started.)
	*/
	SchedWaitingJobsGauge = "schedWaitingJobsGauge"

	/*
		Amount of time it takes the scheduler to complete a full step()
	*/
	SchedStepLatency_ms = "schedStepLatency_ms"

	/*
		Amount of time it takes the scheduler to add newly requested jobs to list of jobs currently being handled by scheduler
	*/
	SchedAddJobsLatency_ms = "schedAddJobsLatency_ms"

	/*
		Amount of time it takes the scheduler to check newly requested jobs for validity
	*/
	SchedCheckJobsLoopLatency_ms = "schedCheckJobsLoopLatency_ms"

	/*
		Amount of time it takes the scheduler to add newly verified jobs from add job channel to list of jobs currently being handled by scheduler
	*/
	SchedAddJobsLoopLatency_ms = "schedAddJobsLoopLatency_ms"

	/*
		Amount of time it takes the scheduler to update list of removed / added nodes to its worker cluster
	*/
	SchedUpdateClusterLatency_ms = "schedUpdateClusterLatency_ms"

	/*
		Amount of time it takes the scheduler to process all messages in mailbox & execute callbacks (if applicable)
	*/
	SchedProcessMessagesLatency_ms = "schedProcessMessagesLatency_ms"

	/*
		Amount of time it takes the scheduler to check if any of the in progress jobs are completed
	*/
	SchedCheckForCompletedLatency_ms = "schedCheckForCompletedLatency_ms"

	/*
		Amount of time it takes the scheduler to kill all jobs requested to be killed
	*/
	SchedKillJobsLatency_ms = "schedKillJobsLatency_ms"

	/*
		Amount of time it takes the scheduler to figure out which tasks to schedule next and on which worker
	*/
	SchedScheduleTasksLatency_ms = "schedScheduleTasksLatency_ms"

	/*--------------------- load based scheduler stats ---------------------------*/
	/*
		number of time Load Based Scheduler saw an unrecognized requestor
	*/
	SchedLBSUnknownJobCounter = "schedLBSUnknownJobCounter"

	/*
	   number of jobs ignored because the load % is 0
	*/
	SchedLBSIgnoredJobCounter = "schedLBSIgnoredJobCounter"

	/*
		number of tasks starting by job class (after the last run of lbs)
	*/
	SchedJobClassTasksStarting = "schedStartingTasks_"

	/*
		number of tasks already running by job class (before starting tasks as per lbs)
	*/
	SchedJobClassTasksRunning = "schedRunningTasks_"

	/*
		number of tasks still waiting by job class after the tasks identified by lbs have started
	*/
	SchedJobClassTasksWaiting = "schedWaitingTasks_"

	/*
		job class % (set via scheduler api)
	*/
	SchedJobClassDefinedPct = "schedClassTargetPct_"

	/*
		job class actual % (set computed from running tasks)
	*/
	SchedJobClassActualPct = "schedClassActualPct_"

	/*
		number of tasks being stopped for the class (due to rebalancing)
	*/
	SchedStoppingTasks = "schedStoppingTasks_"

	/*
		scheduler internal data structure size monitoring
	*/
	SchedLBSConfigLoadPercentsSize     = "schedDS_size_ConfigLoadPercents"
	SchedLBSConfigRequestorToPctsSize  = "schedDS_size_ConfigRequestorToClassMap"
	SchedLBSConfigDescLoadPctSize      = "schedDS_size_ConfigDescLoadPercents"
	SchedLBSWorkingJobClassesSize      = "schedDS_size_WorkingJobClasses"
	SchedLBSWorkingLoadPercentsSize    = "schedDS_size_WorkingLoadPercents"
	SchedLBSWorkingRequestorToPctsSize = "schedDS_size_WorkingRequestorToClassMap"
	SchedLBSWorkingDescLoadPctSize     = "schedDS_size_WorkingDescLoadPercents"
	SchedTaskStartTimeMapSize          = "schedDS_size_taskStartTimeMap"
	SchedInProgressJobsSize            = "schedDS_size_inProgressJobs"
	SchedRequestorMapSize              = "schedDS_size_requestorMap"
	SchedRequestorHistorySize          = "schedDS_size_requestorHistory"
	SchedTaskDurationsSize             = "schedDS_size_taskDurations"
	SchedSagasSize                     = "schedDS_size_sagas"
	SchedRunnersSize                   = "schedDS_size_runners"

	/******************************** Worker metrics **************************************/
	/*
		The number of runs the worker has currently running
	*/
	WorkerActiveRunsGauge = "activeRunsGauge"

	/*
		The disk size change for the indicated directory seen when running the task.
		The reported stat will be of the form commandDirUsage_kb_<PathSuffix from
	*/
	CommandDirUsageKb = "commandDirUsage_kb"

	/*
		the number of times the worker downloaded a snapshot from bundlestore
	*/
	WorkerDownloads = "workerDownloads"

	/*
		the number of times a worker's inti failed.  Should be at most 1 for each worker
	*/
	WorkerDownloadInitFailure = "workerDownloadInitFailure"

	/*
		the amount of time spent downloading snapshots to the worker.  This includes time for
		successful as well as erroring downloads
	*/
	WorkerDownloadLatency_ms = "workerDownloadLatency_ms"

	/*
		The number of runs in the worker's statusAll() response that are not currently running
		TODO - this includes runs that are waiting to start - will not be accurate if we go to a
		worker that can run multiple commands
	*/
	WorkerEndedCachedRunsGauge = "endedCachedRunsGauge"

	/*
		The number of runs that the worker tried to run an whose state is failed
		TODO - understand how/when this gets reset - it's based on the runs in the worker's StatusAll()
		response - how/when do old jobs drop out of StatusAll()?
	*/
	WorkerFailedCachedRunsGauge = "failedCachedRunsGauge"

	/*
		The amount of time it took a worker to init
	*/
	WorkerFinalInitLatency_ms = "workerFinishedInitLatency_ms"

	/*
		The number of workers who are currently exceeding the max init time
	*/
	WorkerActiveInitLatency_ms = "workerActiveInitLatency_ms"

	/*
		the amount of worker's memory currently consumed by the current command (and its subprocesses)
		TODO- verify with Ryan that this description is correct
		scope is osexecer - change to worker?
	*/
	WorkerMemory = "memory"

	/*
		A gauge used to indicate if the worker is currently running a task or if is idling
	*/
	WorkerRunningTask = "runningTask"

	/*
		the number of abort requests received by the worker
	*/
	WorkerServerAborts = "aborts"

	/*
		the number of clear requests received by the worker
	*/
	WorkerServerClears = "clears"

	/*
		The number of QueryWorker requests received by the worker server
	*/
	WorkerServerQueries = "workerQueries"

	/*
		the amount of time it takes the worker to put a run request on the worker's run queue
	*/
	WorkerServerStartRunLatency_ms = "runLatency_ms"

	/*
		the number of run requests a worker has received
	*/
	WorkerServerRuns = "runs"

	/*
		record when a worker service is starting
	*/
	WorkerServerStartedGauge = "workerStartGauge"

	/*
		record when a worker service kills itself
	*/
	WorkerServerKillGauge = "workerKillGauge"

	/*
		The time it takes to run the task (including snapshot handling)
	*/
	WorkerTaskLatency_ms = "workerTaskLatency_ms"

	/*
		Time since the most recent run, status, abort, erase request
	*/
	WorkerTimeSinceLastContactGauge_ms = "timeSinceLastContactGauge_ms"

	/*
		the number of times the worker uploaded a snapshot to bundlestore
	*/
	WorkerUploads = "workerUploads"

	/*
		the amount of time spent uploading snapshots to bundlestore.  This includes time for
		successful as well as erroring uploads
	*/
	WorkerUploadLatency_ms = "workerUploadLatency_ms"

	/*
		Time since the worker started
	*/
	WorkerUptimeGauge_ms = "workerUptimeGauge_ms"

	/*
		The amount of time a worker node was idle between tasks
	*/
	WorkerIdleLatency_ms = "workerIdleLatency_ms"

	/****************************** Git Metrics **********************************************/
	/*
		The number of failures trying to init a ref clone
	*/
	GitClonerInitFailures = "clonerInitFailures"

	/*
		The amount of time it took to init a ref clone
	*/
	GitClonerInitLatency_ms = "clonerInitLatency_ms"

	/*
		The number of times a gitdb stream backend had to resort to a git fetch
	*/
	GitStreamUpdateFetches = "gitStreamUpdateFetches"

	/****************************** Execution Service ******************************************/
	/*
		Execute API metrics emitted by Scheduler
	*/
	BzExecSuccessCounter = "bzExecSuccessCounter"
	BzExecFailureCounter = "bzExecFailureCounter"
	BzExecLatency_ms     = "bzExecLatency_ms"

	/*
		Longrunning GetOperation API metrics emitted by Scheduler
	*/
	BzGetOpSuccessCounter = "bzGetOpSuccessCounter"
	BzGetOpFailureCounter = "bzGetOpFailureCounter"
	BzGetOpLatency_ms     = "bzGetOpLatency_ms"

	/*
		Longrunning CancelOperation API metrics emitted by Scheduler
	*/
	BzCancelOpSuccessCounter = "bzCancelOpSuccessCounter"
	BzCancelOpFailureCounter = "bzCancelOpFailureCounter"
	BzCancelOpLatency_ms     = "bzCancelOpLatency_ms"

	/****************************** Worker/Invoker Execution Timings ***************************/
	/*
		Execution metadata timing metrics emitted by Worker.
		These probably aren't updated enough to make use of the Histogram values, but
		using that type results in these values being automatically cleared each
		stat interval vs a gauge.
	*/
	BzExecQueuedTimeHistogram_ms           = "bzExecQueuedTimeHistogram_ms"
	BzExecInputFetchTimeHistogram_ms       = "bzExecInputFetchTimeHistogram_ms"
	BzExecActionCacheCheckTimeHistogram_ms = "BzExecActionCacheCheckTimeHistogram_ms"
	BzExecActionFetchTimeHistogram_ms      = "BzExecActionFetchTimeHistogram_ms"
	BzExecCommandFetchTimeHistogram_ms     = "BzExecCommandFetchTimeHistogram_ms"
	BzExecExecerTimeHistogram_ms           = "bzExecExecerTimeHistogram_ms"

	/****************************** CAS Service ******************************************/
	/*
		FindMissingBlobs API metrics emitted by Apiserver
	*/
	BzFindBlobsSuccessCounter  = "bzFindBlobsSuccessCounter"
	BzFindBlobsFailureCounter  = "bzFindBlobsFailureCounter"
	BzFindBlobsLengthHistogram = "bzFindBlobsLengthHistogram"
	BzFindBlobsLatency_ms      = "bzFindBlobsLatency_ms"

	/*
		CAS Read API metrics emitted by Apiserver
	*/
	BzReadSuccessCounter = "bzReadSuccessCounter"
	BzReadFailureCounter = "bzReadFailureCounter"
	BzReadBytesHistogram = "bzReadBytesHistogram"
	BzReadLatency_ms     = "bzReadLatency_ms"

	/*
		CAS Write API metrics emitted by Apiserver
	*/
	BzWriteSuccessCounter = "bzWriteSuccessCounter"
	BzWriteFailureCounter = "bzWriteFailureCounter"
	BzWriteBytesHistogram = "bzWriteBytesHistogram"
	BzWriteLatency_ms     = "bzWriteLatency_ms"

	/*
		CAS BatchUpdateBlobs API metrics emitted by Apiserver
	*/
	BzBatchUpdateSuccessCounter  = "bzBatchUpdateSuccessCounter"
	BzBatchUpdateFailureCounter  = "bzBatchUpdateFailureCounter"
	BzBatchUpdateLengthHistogram = "bzBatchUpdateLengthHistogram"
	BzBatchUpdateLatency_ms      = "bzBatchUpdateLatency_ms"

	/*
		CAS BatchReadBlobs API metrics emitted by Apiserver
	*/
	BzBatchReadSuccessCounter  = "bzBatchReadSuccessCounter"
	BzBatchReadFailureCounter  = "bzBatchReadFailureCounter"
	BzBatchReadLengthHistogram = "bzBatchReadLengthHistogram"
	BzBatchReadLatency_ms      = "bzBatchReadLatency_ms"

	/****************************** ActionCache Service ****************************************/
	/*
		ActionCache result metrics
	*/
	BzCachedExecCounter = "bzCachedExecCounter"

	/*
		GetActionResult API metrics emitted by Apiserver
	*/
	BzGetActionSuccessCounter = "bzGetActionSuccessCounter"
	BzGetActionFailureCounter = "bzGetActionFailureCounter"
	BzGetActionLatency_ms     = "bzGetActionLatency_ms"

	/*
		UpdateActionResult API metrics emitted by Apiserver
	*/
	BzUpdateActionSuccessCounter = "bzUpdateActionSuccessCounter"
	BzUpdateActionFailureCounter = "bzUpdateActionFailureCounter"
	BzUpdateActionLatency_ms     = "bzUpdateActionLatency_ms"

	/****************************** Saga Metrics ****************************************/
	/*
		The amount of time spent in looping through the buffered update channel to accumulate
		the updates in a batch, to be processed together
	*/
	SagaUpdateStateLoopLatency_ms = "sagaUpdateStateLoopLatency_ms"

	/*
		The amount of time spent in (bulk) updating the saga state and storing the messages in sagalog
	*/
	SagaUpdateStateLatency_ms = "sagaUpdateStateLatency_ms"

	/*
		The number of updates that were processed together by the updateSagaState loop
	*/
	SagaNumUpdatesProcessed = "sagaNumUpdatesProcessed"

	/*
		The amount of time spent in updating the saga state and sagalog when a task starts or ends
	*/
	SagaStartOrEndTaskLatency_ms = "sagaStartOrEndTaskLatency_ms"
)

Variables ¶

View Source

var DefaultStartupGaugeSpikeLen time.Duration = 1 * time.Minute

View Source

var DoesNotExistTest = RuleChecker{/* contains filtered or unexported fields */}

View Source

var FloatEqTest = RuleChecker{/* contains filtered or unexported fields */}

View Source

var FloatGTTest = RuleChecker{/* contains filtered or unexported fields */}

View Source

var Int64EqTest = RuleChecker{/* contains filtered or unexported fields */}

View Source

var Int64GTTest = RuleChecker{/* contains filtered or unexported fields */}

View Source

var NewCounter func() Counter = newMetricCounter

Overridable instrument creation.

View Source

var NewGauge func() Gauge = newMetricGauge

View Source

var NewGaugeFloat func() GaugeFloat = newMetricGaugeFloat

View Source

var NewHistogram func() Histogram = newMetricHistogram

View Source

var NewLatency func() Latency = newLatency

View Source

var StatReportIntvl time.Duration = 500 * time.Millisecond

Functions ¶

func GetDiskUsageKB ¶

func GetDiskUsageKB(dir string) (uint64, error)

GetDiskUsageKB use posix du to get disk usage of a dir, for simplicity vs syscall or walking dir contents

func PPrintStats ¶

func PPrintStats(tag string, statsRegistry StatsRegistry)

func ReportServerRestart ¶

func ReportServerRestart(stat StatsReceiver, statName string, startupGaugeSpikeLen time.Duration)

func StartUptimeReporting ¶

func StartUptimeReporting(stat StatsReceiver, statName string, serverStartGaugeName string, startupGaugeSpikeLen time.Duration)

func StatsOk ¶

func StatsOk(tag string, statsRegistry StatsRegistry, t *testing.T, contains map[string]Rule) bool

Verify that the stats registry object contains values for the keys in the contains map parameter and that each entry conforms to the rule (condition) associated with that key.

Types ¶

type CapturedRegistry ¶

type CapturedRegistry struct {
	// contains filtered or unexported fields
}

type Counter ¶

type Counter interface {
	Capture() Counter
	Clear()
	Count() int64
	Inc(int64)
	Update(int64)
}

Minimally mirror go-metrics instruments.

Counter

type DirsMonitor ¶

type DirsMonitor struct {
	// contains filtered or unexported fields
}

DirsMonitor monitor disk usage for selected directories

var NopDirsMonitor *DirsMonitor = NewDirsMonitor([]MonitorDir{})

func NewDirsMonitor ¶

func NewDirsMonitor(dirs []MonitorDir) *DirsMonitor

NewDirsMonitor return a DirsMonitor

func (*DirsMonitor) GetEndSizes ¶

func (dm *DirsMonitor) GetEndSizes()

GetEndSizes get the ending sizes of the directories being monitored

func (*DirsMonitor) GetStartSizes ¶

func (dm *DirsMonitor) GetStartSizes()

GetStartSizes get the starting sizes of the directories being monitored

func (*DirsMonitor) RecordSizeStats ¶

func (dm *DirsMonitor) RecordSizeStats(stat StatsReceiver)

RecordSizeStats record the disk size deltas to the stats receiver

type Gauge ¶

type Gauge interface {
	Capture() Gauge
	Update(int64)
	Value() int64
}

Gauge

type GaugeFloat ¶

type GaugeFloat interface {
	Capture() GaugeFloat
	Update(float64)
	Value() float64
}

GaugeFloat

type Histogram ¶

type Histogram interface {
	HistogramView
	Capture() Histogram
	Update(int64)
}

Histogram

type HistogramView ¶

type HistogramView interface {
	Mean() float64
	Count() int64
	Max() int64
	Min() int64
	Sum() int64
	Percentiles(ps []float64) []float64
}

Viewable histogram without updates or capture.

type Latency ¶

type Latency interface {
	Capture() Latency
	Time() Latency //returns self.
	Stop()
	GetPrecision() time.Duration
	Precision(time.Duration) Latency //returns self.
}

Latency. Default implementation uses Histogram as its base.

type MarshalerPretty ¶

type MarshalerPretty interface {
	MarshalJSONPretty() ([]byte, error)
}

To check if pretty printing is supported.

type MonitorDir ¶

type MonitorDir struct {
	Directory  string // the directory to monitor
	StatSuffix string // the suffix to use on the commandDirUsage_kb stat
	// contains filtered or unexported fields
}

MonitorDir a directory to monitor and a shortname (suffix) for reporting the stat

type Rule ¶

type Rule struct {
	Checker RuleChecker
	Value   interface{}
}

defines the condition checker to use to validate the measurement. Each Checker(a, b) implementation will expect a to be the 'got' value and b to be the 'expected' value.

type RuleChecker ¶

type RuleChecker struct {
	// contains filtered or unexported fields
}

Utilities for validating the stats registry contents

add new Checker functions here as needed

type StatsReceiver ¶

type StatsReceiver interface {
	// Return a stats receiver that will automatically namespace elements with
	// the given scope args.
	//
	//   statsReceiver.Scope("foo", "bar").Stat("baz")  // is equivalent to
	//   statsReceiver.Stat("foo", "bar", "baz")
	//
	Scope(scope ...string) StatsReceiver

	// If StatsRegistry supports the latency instrument:
	//
	// Returns a copy that can in turn create a Latency instrument that will use the
	// given precision as its display precision when the stats are rendered as
	// JSON. For example:
	//
	//   statsReceiver.Precision(time.Millisecond).Stat("foo_ms")
	//
	// means that the 'foo_ms' stat will have its nanosecond data points displayed
	// as milliseconds when rendered. Note that this does _not_ affect the
	// captured data in any way, only its display.
	//
	// If the given duration is <= 1ns, we will default to ns.
	Precision(time.Duration) StatsReceiver

	// Provides an event counter
	Counter(name ...string) Counter

	// Provides a histogram of sampled stats over time. Times output in
	// nanoseconds by default, but can be adjusted by using the Precision()
	// function.
	Latency(name ...string) Latency

	// Add a gauge, which holds an int64 value that can be set arbitrarily.
	Gauge(name ...string) Gauge

	// Add a gauge, which holds a float64 value that can be set arbitrarily.
	GaugeFloat(name ...string) GaugeFloat

	// Provide a histogram of sampled stats
	Histogram(name ...string) Histogram

	// Removes the given named stats item if it exists
	Remove(name ...string)

	// Construct a JSON string by marshaling the registry.
	Render(pretty bool) []byte
}

A registry wrapper for metrics that will be collected about the runtime performance of an application.

A quick note about name elements: hierarchical names are stored using a '/' path separator. To avoid confusion, variadic name elements passed to any method will have '/' characters in their names replaced by the string "_SLASH_" before they are used internally. This is instead of failing, because sometimes counters are dynamically generated (i.e. with error names), and it is better to strip the path elements than to, for example, panic.

var CurrentStatsReceiver StatsReceiver = NilStatsReceiver()

Stats users can either reference this global receiver or construct their own.

func DefaultStatsReceiver ¶

func DefaultStatsReceiver() StatsReceiver

DefaultStats is a small wrapper around a go-metrics like registry. Uses defaultStatsRegistry and sets latched duration to zero. Note: a <=0 latch means that the stats are reset on every call to Render().

func NewCustomStatsReceiver ¶

func NewCustomStatsReceiver(makeRegistry func() StatsRegistry, latched time.Duration) (stat StatsReceiver, cancelFn func())

Like DefaultStatsReceiver() but registry and latched are made explicit.

func NewLatchedStatsReceiver ¶

func NewLatchedStatsReceiver(latched time.Duration) (stat StatsReceiver, cancelFn func())

Like DefaultStatsReceiver() but latched interval is made explicit. Starts a goroutine that periodically captures all and clears select instruments. Note: setting latched to <=0 will disable latching so rendering/resetting is on demand. Note: it is up the main app to prevent calls to Render() after canceling the latched receiver.

func NilStatsReceiver ¶

func NilStatsReceiver(scope ...string) StatsReceiver

NilStats ignores all stats operations.

type StatsRegistry ¶

type StatsRegistry interface {
	// Gets an existing metric or registers the given one.
	// The interface can be the metric to register if not found in registry,
	// or a function returning the metric for lazy instantiation.
	GetOrRegister(string, interface{}) interface{}

	// Unregister the metric with the given name.
	Unregister(string)

	// Call the given function for each registered metric.
	Each(func(string, interface{}))
}

Similar to the go-metrics registry but with most methods removed.

Note: the default StatsRegistry (from rcrowley) doesn't support the Latency metric,

only finagleStatsRegistry has logic to check for and marshal latency.

func NewFinagleStatsRegistry ¶

func NewFinagleStatsRegistry() StatsRegistry

type StatsTicker ¶

type StatsTicker interface {
	C() <-chan time.Time
	Stop()
}

wraps the stdlib time.Ticker struct, allows for mocking in tests

func NewStatsTicker ¶

func NewStatsTicker(dur time.Duration) StatsTicker

type StatsTime ¶

type StatsTime interface {
	Now() time.Time
	Since(t time.Time) time.Duration
	NewTicker(d time.Duration) StatsTicker
}

Defines the calls we make to the stdlib time package. Allows for overriding in tests.

var Time StatsTime = DefaultStatsTime()

For testing.

func DefaultStatsTime ¶

func DefaultStatsTime() StatsTime

Returns a StatsTime instance backed by the stdlib 'time' package

func DefaultTestTime ¶

func DefaultTestTime() StatsTime

func NewTestTime ¶

func NewTestTime(now time.Time, since time.Duration, ch <-chan time.Time) StatsTime

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL