intel_pmu

package
v1.28.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 2, 2023 License: MIT Imports: 15 Imported by: 0

README

Intel Performance Monitoring Unit Plugin

This input plugin exposes Intel PMU (Performance Monitoring Unit) metrics available through Linux Perf subsystem.

PMU metrics gives insight into performance and health of IA processor's internal components, including core and uncore units. With the number of cores increasing and processor topology getting more complex the insight into those metrics is vital to assure the best CPU performance and utilization.

Performance counters are CPU hardware registers that count hardware events such as instructions executed, cache-misses suffered, or branches mispredicted. They form a basis for profiling applications to trace dynamic control flow and identify hotspots.

Global configuration options

In addition to the plugin-specific configuration settings, plugins support additional global and plugin configuration settings. These settings are used to modify metrics, tags, and field or create aliases and configure ordering, etc. See the CONFIGURATION.md for more details.

Configuration

# Intel Performance Monitoring Unit plugin exposes Intel PMU metrics available through Linux Perf subsystem
# This plugin ONLY supports Linux on amd64
[[inputs.intel_pmu]]
  ## List of filesystem locations of JSON files that contain PMU event definitions.
  event_definitions = ["/var/cache/pmu/GenuineIntel-6-55-4-core.json", "/var/cache/pmu/GenuineIntel-6-55-4-uncore.json"]

  ## List of core events measurement entities. There can be more than one core_events sections.
  [[inputs.intel_pmu.core_events]]
    ## List of events to be counted. Event names shall match names from event_definitions files.
    ## Single entry can contain name of the event (case insensitive) augmented with config options and perf modifiers.
    ## If absent, all core events from provided event_definitions are counted skipping unresolvable ones.
    events = ["INST_RETIRED.ANY", "CPU_CLK_UNHALTED.THREAD_ANY:config1=0x4043200000000k"]

    ## Limits the counting of events to core numbers specified.
    ## If absent, events are counted on all cores.
    ## Single "0", multiple "0,1,2" and range "0-2" notation is supported for each array element.
    ##   example: cores = ["0,2", "4", "12-16"]
    cores = ["0"]

    ## Indicator that plugin shall attempt to run core_events.events as a single perf group.
    ## If absent or set to false, each event is counted individually. Defaults to false.
    ## This limits the number of events that can be measured to a maximum of available hardware counters per core.
    ## Could vary depending on type of event, use of fixed counters.
    # perf_group = false

    ## Optionally set a custom tag value that will be added to every measurement within this events group.
    ## Can be applied to any group of events, unrelated to perf_group setting.
    # events_tag = ""

  ## List of uncore event measurement entities. There can be more than one uncore_events sections.
  [[inputs.intel_pmu.uncore_events]]
    ## List of events to be counted. Event names shall match names from event_definitions files.
    ## Single entry can contain name of the event (case insensitive) augmented with config options and perf modifiers.
    ## If absent, all uncore events from provided event_definitions are counted skipping unresolvable ones.
    events = ["UNC_CHA_CLOCKTICKS", "UNC_CHA_TOR_OCCUPANCY.IA_MISS"]

    ## Limits the counting of events to specified sockets.
    ## If absent, events are counted on all sockets.
    ## Single "0", multiple "0,1" and range "0-1" notation is supported for each array element.
    ##   example: sockets = ["0-2"]
    sockets = ["0"]

    ## Indicator that plugin shall provide an aggregated value for multiple units of same type distributed in an uncore.
    ## If absent or set to false, events for each unit are exposed as separate metric. Defaults to false.
    # aggregate_uncore_units = false

    ## Optionally set a custom tag value that will be added to every measurement within this events group.
    # events_tag = ""
Modifiers

Perf modifiers adjust event-specific perf attribute to fulfill particular requirements. Details about perf attribute structure could be found in perf_event_open syscall manual.

General schema of configuration's events list element:

EVENT_NAME(:(config|config1|config2)=(0x[0-9a-f]{1-16})(p|k|u|h|H|I|G|D))*

where:

Modifier Underlying attribute Description
config perf_event_attr.config type-specific configuration
config1 perf_event_attr.config1 extension of config
config2 perf_event_attr.config2 extension of config1
p perf_event_attr.precise_ip skid constraint
k perf_event_attr.exclude_user don't count user
u perf_event_attr.exclude_kernel don't count kernel
h / H perf_event_attr.exclude_guest don't count in guest
I perf_event_attr.exclude_idle don't count when idle
G perf_event_attr.exclude_hv don't count hypervisor
D perf_event_attr.pinned must always be on PMU

Requirements

The plugin is using iaevents library which is a golang package that makes accessing the Linux kernel's perf interface easier.

Intel PMU plugin, is only intended for use on linux 64-bit systems.

Event definition JSON files for specific architectures can be found at github. A script to download the event definitions that are appropriate for your system (event_download.py) is available at pmu-tools. Please keep these files in a safe place on your system.

Measuring

Plugin allows measuring both core and uncore events. During plugin initialization the event names provided by user are compared with event definitions included in JSON files and translated to perf attributes. Next, those events are activated to start counting. During every telegraf interval, the plugin reads proper measurement for each previously activated event.

Each single core event may be counted severally on every available CPU's core. In contrast, uncore events could be placed in many PMUs within specified CPU package. The plugin allows choosing core ids (core events) or socket ids (uncore events) on which the counting should be executed. Uncore events are separately activated on all socket's PMUs, and can be exposed as separate measurement or to be summed up as one measurement.

Obtained measurements are stored as three values: Raw, Enabled and Running. Raw is a total count of event. Enabled and running are total time the event was enabled and running. Normally these are the same. If more events are started than available counter slots on the PMU, then multiplexing occurs and events only run part of the time. Therefore, the plugin provides a 4-th value called scaled which is calculated using following formula: raw * enabled / running.

Events are measured for all running processes.

Core event groups

Perf allows assembling events as a group. A perf event group is scheduled onto the CPU as a unit: it will be put onto the CPU only if all of the events in the group can be put onto the CPU. This means that the values of the member events can be meaningfully compared — added, divided (to get ratios), and so on — with each other, since they have counted events for the same set of executed instructions (source).

NOTE: Be aware that the plugin will throw an error when trying to create core event group of size that exceeds available core PMU counters. The error message from perf syscall will be shown as "invalid argument". If you want to check how many PMUs are supported by your Intel CPU, you can use the cpuid command.

Note about file descriptors

The plugin opens a number of file descriptors dependent on number of monitored CPUs and number of monitored counters. It can easily exceed the default per process limit of allowed file descriptors. Depending on configuration, it might be required to increase the limit of opened file descriptors allowed. This can be done for example by using ulimit -n command.

Metrics

On each Telegraf interval, Intel PMU plugin transmits following data:

Metric Fields
Field Type Description
enabled uint64 time counter, contains time the associated perf event was enabled
running uint64 time counter, contains time the event was actually counted
raw uint64 value counter, contains event count value during the time the event was actually counted
scaled uint64 value counter, contains approximated value of counter if the event was continuously counted, using scaled = raw * (enabled / running) formula
Metric Tags - common
Tag Description
host hostname as read by Telegraf
event name of the event
Metric Tags - core events
Tag Description
cpu CPU id as identified by linux OS (either logical cpu id when HT on or physical cpu id when HT off)
events_tag (optional) tag as defined in "intel_pmu.core_events" configuration element
Metric Tags - uncore events
Tag Description
socket socket number as identified by linux OS (physical_package_id)
unit_type type of event-capable PMU that the event was counted for, provides category of PMU that the event was counted for, e.g. cbox for uncore_cbox_1, r2pcie for uncore_r2pcie etc.
unit name of event-capable PMU that the event was counted for, as listed in /sys/bus/event_source/devices/ e.g. uncore_cbox_1, uncore_imc_1 etc. Present for non-aggregated uncore events only
events_tag (optional) tag as defined in "intel_pmu.uncore_events" configuration element

Example Output

Event group:

pmu_metric,cpu=0,event=CPU_CLK_THREAD_UNHALTED.REF_XCLK,events_tag=unhalted,host=xyz enabled=2871237051i,running=2871237051i,raw=1171711i,scaled=1171711i 1621254096000000000
pmu_metric,cpu=0,event=CPU_CLK_UNHALTED.THREAD_P_ANY,events_tag=unhalted,host=xyz enabled=2871240713i,running=2871240713i,raw=72340716i,scaled=72340716i 1621254096000000000
pmu_metric,cpu=1,event=CPU_CLK_THREAD_UNHALTED.REF_XCLK,events_tag=unhalted,host=xyz enabled=2871118275i,running=2871118275i,raw=1646752i,scaled=1646752i 1621254096000000000
pmu_metric,cpu=1,event=CPU_CLK_UNHALTED.THREAD_P_ANY,events_tag=unhalted,host=xyz raw=108802421i,scaled=108802421i,enabled=2871120107i,running=2871120107i 1621254096000000000
pmu_metric,cpu=2,event=CPU_CLK_THREAD_UNHALTED.REF_XCLK,events_tag=unhalted,host=xyz enabled=2871143950i,running=2871143950i,raw=1316834i,scaled=1316834i 1621254096000000000
pmu_metric,cpu=2,event=CPU_CLK_UNHALTED.THREAD_P_ANY,events_tag=unhalted,host=xyz enabled=2871074681i,running=2871074681i,raw=68728436i,scaled=68728436i 1621254096000000000

Uncore event not aggregated:

pmu_metric,event=UNC_CBO_XSNP_RESPONSE.MISS_XCORE,host=xyz,socket=0,unit=uncore_cbox_0,unit_type=cbox enabled=2870630747i,running=2870630747i,raw=183996i,scaled=183996i 1621254096000000000
pmu_metric,event=UNC_CBO_XSNP_RESPONSE.MISS_XCORE,host=xyz,socket=0,unit=uncore_cbox_1,unit_type=cbox enabled=2870608194i,running=2870608194i,raw=185703i,scaled=185703i 1621254096000000000
pmu_metric,event=UNC_CBO_XSNP_RESPONSE.MISS_XCORE,host=xyz,socket=0,unit=uncore_cbox_2,unit_type=cbox enabled=2870600211i,running=2870600211i,raw=187331i,scaled=187331i 1621254096000000000
pmu_metric,event=UNC_CBO_XSNP_RESPONSE.MISS_XCORE,host=xyz,socket=0,unit=uncore_cbox_3,unit_type=cbox enabled=2870593914i,running=2870593914i,raw=184228i,scaled=184228i 1621254096000000000
pmu_metric,event=UNC_CBO_XSNP_RESPONSE.MISS_XCORE,host=xyz,socket=0,unit=uncore_cbox_4,unit_type=cbox scaled=195355i,enabled=2870558952i,running=2870558952i,raw=195355i 1621254096000000000
pmu_metric,event=UNC_CBO_XSNP_RESPONSE.MISS_XCORE,host=xyz,socket=0,unit=uncore_cbox_5,unit_type=cbox enabled=2870554131i,running=2870554131i,raw=197756i,scaled=197756i 1621254096000000000

Uncore event aggregated:

pmu_metric,event=UNC_CBO_XSNP_RESPONSE.MISS_XCORE,host=xyz,socket=0,unit_type=cbox enabled=13199712335i,running=13199712335i,raw=467485i,scaled=467485i 1621254412000000000

Time multiplexing:

pmu_metric,cpu=0,event=CPU_CLK_THREAD_UNHALTED.REF_XCLK,host=xyz raw=2947727i,scaled=4428970i,enabled=2201071844i,running=1464935978i 1621254412000000000
pmu_metric,cpu=0,event=CPU_CLK_UNHALTED.THREAD_P_ANY,host=xyz running=1465155618i,raw=302553190i,scaled=454511623i,enabled=2201035323i 1621254412000000000
pmu_metric,cpu=0,event=CPU_CLK_UNHALTED.REF_XCLK,host=xyz enabled=2200994057i,running=1466812391i,raw=3177535i,scaled=4767982i 1621254412000000000
pmu_metric,cpu=0,event=CPU_CLK_UNHALTED.REF_XCLK_ANY,host=xyz enabled=2200963921i,running=1470523496i,raw=3359272i,scaled=5027894i 1621254412000000000
pmu_metric,cpu=0,event=L1D_PEND_MISS.PENDING_CYCLES_ANY,host=xyz enabled=2200933946i,running=1470322480i,raw=23631950i,scaled=35374798i 1621254412000000000
pmu_metric,cpu=0,event=L1D_PEND_MISS.PENDING_CYCLES,host=xyz raw=18767833i,scaled=28169827i,enabled=2200888514i,running=1466317384i 1621254412000000000

Changelog

Version Description
v1.0.0 Initial version
v1.1.0 Added support for new perfmon event format. Old event format is still accepted (warn message will be printed in the log)

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CoreEventEntity

type CoreEventEntity struct {
	Events    []string `toml:"events"`
	Cores     []string `toml:"cores"`
	EventsTag string   `toml:"events_tag"`
	PerfGroup bool     `toml:"perf_group"`
	// contains filtered or unexported fields
}

CoreEventEntity represents config section for core events.

type IntelPMU

type IntelPMU struct {
	EventListPaths []string             `toml:"event_definitions"`
	CoreEntities   []*CoreEventEntity   `toml:"core_events"`
	UncoreEntities []*UncoreEventEntity `toml:"uncore_events"`

	Log telegraf.Logger `toml:"-"`
	// contains filtered or unexported fields
}

IntelPMU is the plugin type.

func (*IntelPMU) Gather

func (i *IntelPMU) Gather(acc telegraf.Accumulator) error

func (*IntelPMU) Init

func (i *IntelPMU) Init() error

func (*IntelPMU) SampleConfig

func (*IntelPMU) SampleConfig() string

func (IntelPMU) Start

Start is required for IntelPMU to implement the telegraf.ServiceInput interface. Necessary initialization and config checking are done in Init.

func (*IntelPMU) Stop

func (i *IntelPMU) Stop()

type MockTransformer

type MockTransformer struct {
	mock.Mock
}

MockTransformer is an autogenerated mock type for the Transformer type

func (*MockTransformer) Transform

func (_m *MockTransformer) Transform(reader iaevents.Reader, matcher iaevents.Matcher) ([]*iaevents.PerfEvent, error)

Transform provides a mock function with given fields: reader, matcher

type UncoreEventEntity

type UncoreEventEntity struct {
	Events    []string `toml:"events"`
	Sockets   []string `toml:"sockets"`
	Aggregate bool     `toml:"aggregate_uncore_units"`
	EventsTag string   `toml:"events_tag"`
	// contains filtered or unexported fields
}

UncoreEventEntity represents config section for uncore events.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL