discovery

package
v0.53.0-rc.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 11, 2024 License: Apache-2.0 Imports: 18 Imported by: 506

README

Service Discovery

This directory contains the service discovery (SD) component of Prometheus.

Design of a Prometheus SD

There are many requests to add new SDs to Prometheus, this section looks at what makes a good SD and covers some of the common implementation issues.

Does this make sense as an SD?

The first question to be asked is does it make sense to add this particular SD? An SD mechanism should be reasonably well established, and at a minimum in use across multiple organizations. It should allow discovering of machines and/or services running somewhere. When exactly an SD is popular enough to justify being added to Prometheus natively is an open question.

Note: As part of lifting the past moratorium on new SD implementations it was agreed that, in addition to the existing requirements, new service discovery implementations will be required to have a committed maintainer with push access (i.e., on -team).

It should not be a brand new SD mechanism, or a variant of an established mechanism. We want to integrate Prometheus with the SD that's already there in your infrastructure, not invent yet more ways to do service discovery. We also do not add mechanisms to work around users lacking service discovery and/or configuration management infrastructure.

SDs that merely discover other applications running the same software (e.g. talk to one Kafka or Cassandra server to find the others) are not service discovery. In that case the SD you should be looking at is whatever decides that a machine is going to be a Kafka server, likely a machine database or configuration management system.

If something is particularly custom or unusual, file_sd is the generic mechanism provided for users to hook in. Generally with Prometheus we offer a single generic mechanism for things with infinite variations, rather than trying to support everything natively (see also, alertmanager webhook, remote read, remote write, node exporter textfile collector). For example anything that would involve talking to a relational database should use file_sd instead.

For configuration management systems like Chef, while they do have a database/API that'd in principle make sense to talk to for service discovery, the idiomatic approach is to use Chef's templating facilities to write out a file for use with file_sd.

Mapping from SD to Prometheus

The general principle with SD is to extract all the potentially useful information we can out of the SD, and let the user choose what they need of it using relabelling. This information is generally termed metadata.

Metadata is exposed as a set of key/value pairs (labels) per target. The keys are prefixed with __meta_<sdname>_<key>, and there should also be an __address__ label with the host:port of the target (preferably an IP address to avoid DNS lookups). No other labelnames should be exposed.

It is very common for initial pull requests for new SDs to include hardcoded assumptions that make sense for the author's setup. SD should be generic, any customisation should be handled via relabelling. There should be basically no business logic, filtering, or transformations of the data from the SD beyond that which is needed to fit it into the metadata data model.

Arrays (e.g. a list of tags) should be converted to a single label with the array values joined with a comma. Also prefix and suffix the value with a comma. So for example the array [a, b, c] would become ,a,b,c,. As relabelling regexes are fully anchored, this makes it easier to write correct regexes against (.*,a,.* works no matter where a appears in the list). The canonical example of this is __meta_consul_tags.

Maps, hashes and other forms of key/value pairs should be all prefixed and exposed as labels. For example for EC2 tags, there would be __meta_ec2_tag_Description=mydescription for the Description tag. Labelnames may only contain [_a-zA-Z0-9], sanitize by replacing with underscores as needed.

For targets with multiple potential ports, you can a) expose them as a list, b) if they're named expose them as a map or c) expose them each as their own target. Kubernetes SD takes the target per port approach. a) and b) can be combined.

For machine-like SDs (OpenStack, EC2, Kubernetes to some extent) there may be multiple network interfaces for a target. Thus far reporting the details of only the first/primary network interface has sufficed.

Other implementation considerations

SDs are intended to dump all possible targets. For example the optional use of EC2 service discovery would be to take the entire region's worth of EC2 instances it provides and do everything needed in one scrape_config. For large deployments where you are only interested in a small proportion of the returned targets, this may cause performance issues. If this occurs it is acceptable to also offer filtering via whatever mechanisms the SD exposes. For EC2 that would be the Filter option on DescribeInstances. Keep in mind that this is a performance optimisation, it should be possible to do the same filtering using relabelling alone. As with SD generally, we do not invent new ways to filter targets (that is what relabelling is for), merely offer up whatever functionality the SD itself offers.

It is a general rule with Prometheus that all configuration comes from the configuration file. While the libraries you use to talk to the SD may also offer other mechanisms for providing configuration/authentication under the covers (EC2's use of environment variables being a prime example), using your SD mechanism should not require this. Put another way, your SD implementation should not read environment variables or files to obtain configuration.

Some SD mechanisms have rate limits that make them challenging to use. As an example we have unfortunately had to reject Amazon ECS service discovery due to the rate limits being so low that it would not be usable for anything beyond small setups.

If a system offers multiple distinct types of SD, select which is in use with a configuration option rather than returning them all from one mega SD that requires relabelling to select just the one you want. So far we have only seen this with Kubernetes. When a single SD with a selector vs. multiple distinct SDs makes sense is an open question.

If there is a failure while processing talking to the SD, abort rather than returning partial data. It is better to work from stale targets than partial or incorrect metadata.

The information obtained from service discovery is not considered sensitive security wise. Do not return secrets in metadata, anyone with access to the Prometheus server will be able to see them.

Writing an SD mechanism

The SD interface

A Service Discovery (SD) mechanism has to discover targets and provide them to Prometheus. We expect similar targets to be grouped together, in the form of a target group. The SD mechanism sends the targets down to prometheus as list of target groups.

An SD mechanism has to implement the Discoverer Interface:

type Discoverer interface {
	Run(ctx context.Context, up chan<- []*targetgroup.Group)
}

Prometheus will call the Run() method on a provider to initialize the discovery mechanism. The mechanism will then send all the target groups into the channel. Now the mechanism will watch for changes. For each update it can send all target groups, or only changed and new target groups, down the channel. Manager will handle both cases.

For example if we had a discovery mechanism and it retrieves the following groups:

[]targetgroup.Group{
	{
		Targets: []model.LabelSet{
			{
				"__instance__": "10.11.150.1:7870",
				"hostname":     "demo-target-1",
				"test":         "simple-test",
			},
			{
				"__instance__": "10.11.150.4:7870",
				"hostname":     "demo-target-2",
				"test":         "simple-test",
			},
		},
		Labels: model.LabelSet{
			"job": "mysql",
		},
		"Source": "file1",
	},
	{
		Targets: []model.LabelSet{
			{
				"__instance__": "10.11.122.11:6001",
				"hostname":     "demo-postgres-1",
				"test":         "simple-test",
			},
			{
				"__instance__": "10.11.122.15:6001",
				"hostname":     "demo-postgres-2",
				"test":         "simple-test",
			},
		},
		Labels: model.LabelSet{
			"job": "postgres",
		},
		"Source": "file2",
	},
}

Here there are two target groups one group with source file1 and another with file2. The grouping is implementation specific and could even be one target per group. But, one has to make sure every target group sent by an SD instance should have a Source which is unique across all the target groups of that SD instance.

In this case, both the target groups are sent down the channel the first time Run() is called. Now, for an update, we need to send the whole changed target group down the channel. i.e, if the target with hostname: demo-postgres-2 goes away, we send:

&targetgroup.Group{
	Targets: []model.LabelSet{
		{
			"__instance__": "10.11.122.11:6001",
			"hostname":     "demo-postgres-1",
			"test":         "simple-test",
		},
	},
	Labels: model.LabelSet{
		"job": "postgres",
	},
	"Source": "file2",
}

down the channel.

If all the targets in a group go away, we need to send the target groups with empty Targets down the channel. i.e, if all targets with job: postgres go away, we send:

&targetgroup.Group{
	Targets:  nil,
	"Source": "file2",
}

down the channel.

The Config interface

Now that your service discovery mechanism is ready to discover targets, you must help Prometheus discover it. This is done by implementing the discovery.Config interface and registering it with discovery.RegisterConfig in an init function of your package.

type Config interface {
	// Name returns the name of the discovery mechanism.
	Name() string

	// NewDiscoverer returns a Discoverer for the Config
	// with the given DiscovererOptions.
	NewDiscoverer(DiscovererOptions) (Discoverer, error)
}

type DiscovererOptions struct {
	Logger log.Logger

	// A registerer for the Discoverer's metrics.
	Registerer prometheus.Registerer
	
	HTTPClientOptions []config.HTTPClientOption
}

The value returned by Name() should be short, descriptive, lowercase, and unique. It's used to tag the provided Logger and as the part of the YAML key for your SD mechanism's list of configs in scrape_config and alertmanager_config (e.g. ${NAME}_sd_configs).

New Service Discovery Check List

Here are some non-obvious parts of adding service discoveries that need to be verified:

  • Validate that discovery configs can be DeepEqualled by adding them to config/testdata/conf.good.yml and to the associated tests.

  • If the config contains file paths directly or indirectly (e.g. with a TLSConfig or HTTPClientConfig field), then it must implement config.DirectorySetter.

  • Import your SD package from prometheus/discovery/install. The install package is imported from main to register all builtin SD mechanisms.

  • List the service discovery in both <scrape_config> and <alertmanager_config> in docs/configuration/configuration.md.

Examples of Service Discovery pull requests

The examples given might become out of date but should give a good impression about the areas touched by a new service discovery.

Documentation

Index

Constants

View Source
const (
	KubernetesMetricsNamespace = "prometheus_sd_kubernetes"
)

Variables

This section is empty.

Functions

func CreateAndRegisterSDMetrics added in v0.50.0

func CreateAndRegisterSDMetrics(reg prometheus.Registerer) (map[string]DiscovererMetrics, error)

Registers the metrics needed for SD mechanisms. Does not register the metrics for the Discovery Manager. TODO(ptodev): Add ability to unregister the metrics?

func HTTPClientOptions

func HTTPClientOptions(opts ...config.HTTPClientOption) func(*Manager)

HTTPClientOptions sets the list of HTTP client options to expose to Discoverers. It is up to Discoverers to choose to use the options provided.

func MarshalYAMLWithInlineConfigs

func MarshalYAMLWithInlineConfigs(in interface{}) (interface{}, error)

MarshalYAMLWithInlineConfigs helps implement yaml.Marshal for structs that have a Configs field that should be inlined.

func Name

func Name(n string) func(*Manager)

Name sets the name of the manager.

func RegisterConfig

func RegisterConfig(config Config)

RegisterConfig registers the given Config type for YAML marshaling and unmarshaling.

func RegisterK8sClientMetricsWithPrometheus added in v0.50.0

func RegisterK8sClientMetricsWithPrometheus(registerer prometheus.Registerer) error

func RegisterSDMetrics added in v0.50.0

func RegisterSDMetrics(registerer prometheus.Registerer, rmm RefreshMetricsManager) (map[string]DiscovererMetrics, error)

RegisterSDMetrics registers the metrics used by service discovery mechanisms. RegisterSDMetrics should be called only once during the lifetime of the Prometheus process. There is no need for the Prometheus process to unregister the metrics.

func UnmarshalYAMLWithInlineConfigs

func UnmarshalYAMLWithInlineConfigs(out interface{}, unmarshal func(interface{}) error) error

UnmarshalYAMLWithInlineConfigs helps implement yaml.Unmarshal for structs that have a Configs field that should be inlined.

Types

type Config

type Config interface {
	// Name returns the name of the discovery mechanism.
	Name() string

	// NewDiscoverer returns a Discoverer for the Config
	// with the given DiscovererOptions.
	NewDiscoverer(DiscovererOptions) (Discoverer, error)

	// NewDiscovererMetrics returns the metrics used by the service discovery.
	NewDiscovererMetrics(prometheus.Registerer, RefreshMetricsInstantiator) DiscovererMetrics
}

A Config provides the configuration and constructor for a Discoverer.

type Configs

type Configs []Config

Configs is a slice of Config values that uses custom YAML marshaling and unmarshaling to represent itself as a mapping of the Config values grouped by their types.

func (Configs) MarshalYAML

func (c Configs) MarshalYAML() (interface{}, error)

MarshalYAML implements yaml.Marshaler.

func (*Configs) SetDirectory

func (c *Configs) SetDirectory(dir string)

SetDirectory joins any relative file paths with dir.

func (*Configs) UnmarshalYAML

func (c *Configs) UnmarshalYAML(unmarshal func(interface{}) error) error

UnmarshalYAML implements yaml.Unmarshaler.

type Discoverer

type Discoverer interface {
	// Run hands a channel to the discovery provider (Consul, DNS, etc.) through which
	// it can send updated target groups. It must return when the context is canceled.
	// It should not close the update channel on returning.
	Run(ctx context.Context, up chan<- []*targetgroup.Group)
}

Discoverer provides information about target groups. It maintains a set of sources from which TargetGroups can originate. Whenever a discovery provider detects a potential change, it sends the TargetGroup through its channel.

Discoverer does not know if an actual change happened. It does guarantee that it sends the new TargetGroup whenever a change happens.

Discoverers should initially send a full set of all discoverable TargetGroups.

type DiscovererMetrics added in v0.50.0

type DiscovererMetrics interface {
	Register() error
	Unregister()
}

Internal metrics of service discovery mechanisms.

type DiscovererOptions

type DiscovererOptions struct {
	Logger log.Logger

	Metrics DiscovererMetrics

	// Extra HTTP client options to expose to Discoverers. This field may be
	// ignored; Discoverer implementations must opt-in to reading it.
	HTTPClientOptions []config.HTTPClientOption
}

DiscovererOptions provides options for a Discoverer.

type Manager

type Manager struct {
	// contains filtered or unexported fields
}

Manager maintains a set of discovery providers and sends each update to a map channel. Targets are grouped by the target set name.

func NewManager

func NewManager(ctx context.Context, logger log.Logger, registerer prometheus.Registerer, sdMetrics map[string]DiscovererMetrics, options ...func(*Manager)) *Manager

NewManager is the Discovery Manager constructor.

func (*Manager) ApplyConfig

func (m *Manager) ApplyConfig(cfg map[string]Configs) error

ApplyConfig checks if discovery provider with supplied config is already running and keeps them as is. Remaining providers are then stopped and new required providers are started using the provided config.

func (*Manager) Providers added in v0.37.0

func (m *Manager) Providers() []*Provider

Providers returns the currently configured SD providers.

func (*Manager) Run

func (m *Manager) Run() error

Run starts the background processing.

func (*Manager) StartCustomProvider

func (m *Manager) StartCustomProvider(ctx context.Context, name string, worker Discoverer)

StartCustomProvider is used for sdtool. Only use this if you know what you're doing.

func (*Manager) SyncCh

func (m *Manager) SyncCh() <-chan map[string][]*targetgroup.Group

SyncCh returns a read only channel used by all the clients to receive target updates.

func (*Manager) UnregisterMetrics added in v0.52.0

func (m *Manager) UnregisterMetrics()

UnregisterMetrics unregisters manager metrics. It does not unregister service discovery or refresh metrics, whose lifecycle is managed independent of the discovery Manager.

type MetricRegisterer added in v0.50.0

type MetricRegisterer interface {
	RegisterMetrics() error
	UnregisterMetrics()
}

A utility to be used by implementations of discovery.Discoverer which need to manage the lifetime of their metrics.

func NewMetricRegisterer added in v0.50.0

func NewMetricRegisterer(reg prometheus.Registerer, metrics []prometheus.Collector) MetricRegisterer

Creates an instance of a MetricRegisterer. Typically called inside the implementation of the NewDiscoverer() method.

type Metrics added in v0.50.0

type Metrics struct {
	FailedConfigs     prometheus.Gauge
	DiscoveredTargets *prometheus.GaugeVec
	ReceivedUpdates   prometheus.Counter
	DelayedUpdates    prometheus.Counter
	SentUpdates       prometheus.Counter
}

Metrics to be used with a discovery manager.

func NewManagerMetrics added in v0.50.0

func NewManagerMetrics(registerer prometheus.Registerer, sdManagerName string) (*Metrics, error)

func (*Metrics) Unregister added in v0.52.0

func (m *Metrics) Unregister(registerer prometheus.Registerer)

Unregister unregisters all metrics.

type NoopDiscovererMetrics added in v0.50.0

type NoopDiscovererMetrics struct{}

Create a dummy metrics struct, because this SD doesn't have any metrics.

func (*NoopDiscovererMetrics) Register added in v0.50.0

func (*NoopDiscovererMetrics) Register() error

Register implements discovery.DiscovererMetrics.

func (*NoopDiscovererMetrics) Unregister added in v0.50.0

func (*NoopDiscovererMetrics) Unregister()

Unregister implements discovery.DiscovererMetrics.

type Provider added in v0.37.0

type Provider struct {
	// contains filtered or unexported fields
}

Provider holds a Discoverer instance, its configuration, cancel func and its subscribers.

func (*Provider) Config added in v0.37.0

func (p *Provider) Config() interface{}

func (*Provider) Discoverer added in v0.37.0

func (p *Provider) Discoverer() Discoverer

Discoverer return the Discoverer of the provider.

func (*Provider) IsStarted added in v0.37.0

func (p *Provider) IsStarted() bool

IsStarted return true if Discoverer is started.

type RefreshMetrics added in v0.50.0

type RefreshMetrics struct {
	Failures prometheus.Counter
	Duration prometheus.Observer
}

Metrics used by the "refresh" package. We define them here in the "discovery" package in order to avoid a cyclic dependency between "discovery" and "refresh".

type RefreshMetricsInstantiator added in v0.50.0

type RefreshMetricsInstantiator interface {
	Instantiate(mech string) *RefreshMetrics
}

Instantiate the metrics used by the "refresh" package.

type RefreshMetricsManager added in v0.50.0

type RefreshMetricsManager interface {
	DiscovererMetrics
	RefreshMetricsInstantiator
}

An interface for registering, unregistering, and instantiating metrics for the "refresh" package. Refresh metrics are registered and unregistered outside of the service discovery mechanism. This is so that the same metrics can be reused across different service discovery mechanisms. To manage refresh metrics inside the SD mechanism, we'd need to use const labels which are specific to that SD. However, doing so would also expose too many unused metrics on the Prometheus /metrics endpoint.

func NewRefreshMetrics added in v0.50.0

func NewRefreshMetrics(reg prometheus.Registerer) RefreshMetricsManager

type RefreshMetricsVecs added in v0.50.0

type RefreshMetricsVecs struct {
	// contains filtered or unexported fields
}

Metric vectors for the "refresh" package. We define them here in the "discovery" package in order to avoid a cyclic dependency between "discovery" and "refresh".

func (*RefreshMetricsVecs) Instantiate added in v0.50.0

func (m *RefreshMetricsVecs) Instantiate(mech string) *RefreshMetrics

Instantiate returns metrics out of metric vectors.

func (*RefreshMetricsVecs) Register added in v0.50.0

func (m *RefreshMetricsVecs) Register() error

Register implements discovery.DiscovererMetrics.

func (*RefreshMetricsVecs) Unregister added in v0.50.0

func (m *RefreshMetricsVecs) Unregister()

Unregister implements discovery.DiscovererMetrics.

type StaticConfig

type StaticConfig []*targetgroup.Group

A StaticConfig is a Config that provides a static list of targets.

func (StaticConfig) Name

func (StaticConfig) Name() string

Name returns the name of the service discovery mechanism.

func (StaticConfig) NewDiscoverer

func (c StaticConfig) NewDiscoverer(DiscovererOptions) (Discoverer, error)

NewDiscoverer returns a Discoverer for the Config.

func (StaticConfig) NewDiscovererMetrics added in v0.50.0

No metrics are needed for this service discovery mechanism.

type StaticProvider

type StaticProvider struct {
	TargetGroups []*targetgroup.Group
}

StaticProvider holds a list of target groups that never change.

func (*StaticProvider) Run

func (sd *StaticProvider) Run(ctx context.Context, ch chan<- []*targetgroup.Group)

Run implements the Worker interface.

Directories

Path Synopsis
Package install has the side-effect of registering all builtin service discovery config types.
Package install has the side-effect of registering all builtin service discovery config types.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL