Documentation ¶
Overview ¶
Package ec2cluster implements support for maintaining elastic clusters of Reflow instances on EC2.
The EC2 instances created launch reflowlet agent processes that are given the user's profile token so that they can set up HTTPS servers that can perform mutual authentication to the reflow driver process and other reflowlets (for transferring objects) and also access external services like caching.
The VM instances are configured to terminate if they are idle on EC2's billing hour boundary. They also terminate on any fatal reflowlet error.
Index ¶
- Constants
- func GetAMI(sess *session.Session) (string, error)
- func GetSpotPlacementScores(ctx context.Context, api ec2iface.EC2API, region, instanceType string) (map[string]int, error)
- func InstanceType(need reflow.Resources, spot bool, maxPrice float64) (string, reflow.Resources)
- func NewSpotProber(capacityFunc capacityFunc, maxProbeDepth int, ttl time.Duration) *spotProber
- func OnDemandPrice(typ, region string) (hourlyPriceUsd float64)
- type CloudFile
- type CloudUnit
- type Cluster
- func (c *Cluster) Allocate(ctx context.Context, req reflow.Requirements, labels pool.Labels) (alloc pool.Alloc, err error)
- func (c *Cluster) Available(need reflow.Resources, maxPrice float64) (InstanceSpec, bool)
- func (c *Cluster) CanAllocate(r reflow.Resources) (bool, error)
- func (c *Cluster) CheapestInstancePriceUSD() float64
- func (c *Cluster) Config() interface{}
- func (c *Cluster) ExportStats()
- func (c *Cluster) GetName() string
- func (*Cluster) Help() string
- func (c *Cluster) Init(tls tls.Certs, sess *session.Session, labels pool.Labels, ...) error
- func (c *Cluster) InstancePriceUSD(typ string) float64
- func (c *Cluster) Launch(ctx context.Context, spec InstanceSpec) ManagedInstance
- func (c *Cluster) MaxAlloc() reflow.Resources
- func (c *Cluster) Notify(waiting, pending reflow.Resources)
- func (c *Cluster) Probe(ctx context.Context, instanceType string) (reflow.Resources, time.Duration, error)
- func (c *Cluster) QueryTags() map[string]string
- func (c *Cluster) Refresh(ctx context.Context) (map[string]string, error)
- func (c *Cluster) Region() string
- func (c *Cluster) Setup(sess *session.Session) error
- func (c *Cluster) Start(ctx context.Context, wg *sync.WaitGroup)
- func (c *Cluster) Verify() error
- type InstanceSizeToDiskSpace
- type InstanceSpec
- type InstanceTypeStat
- type ManagedCluster
- type ManagedInstance
- type Manager
- type OverallStats
Constants ¶
const ExpVarCluster = "ec2cluster"
ExpVarCluster is the expvar endpoint for ec2cluster information.
const ReflowletCloudwatchFlushMs = 5000
Variables ¶
This section is empty.
Functions ¶
func GetSpotPlacementScores ¶
func GetSpotPlacementScores(ctx context.Context, api ec2iface.EC2API, region, instanceType string) (map[string]int, error)
GetSpotPlacementScores returns spot placement scores for the given instance type in the given region. GetSpotPlacementScores returns a map of each Availability Zone name (within the given region) to the score. Note that the region is stripped from the AZ names (for eg: if region is "us-west-2", then AZ name "us-west-2a" is trimmed to "a").
func InstanceType ¶
InstanceType returns the instance type (and the amount of resources it provides) which is most appropriate for the needed resources. `spot` determines whether we should consider instance types that are available as spot instances or not.
func NewSpotProber ¶
NewSpotProber returns a spot prober which uses the given capacityFunc to determine capacity, the given maxProbeDepth as the depth to start probing at and uses the ttl as expiration of previously cached probing result.
func OnDemandPrice ¶
OnDemandPrice returns the on-demand hourly price of the given instance type in the given region.
Types ¶
type CloudFile ¶
type CloudFile struct { Path string `yaml:"path,omitempty"` Permissions string `yaml:"permissions,omitempty"` Owner string `yaml:"owner,omitempty"` Content string `yaml:"content,omitempty"` Encoding string `yaml:"encoding,omitempty"` }
CloudFile is a component of the cloudConfig configuration for CoreOS. It represents a file that will be written to the filesystem.
type CloudUnit ¶
type CloudUnit struct { Name string `yaml:"name,omitempty"` Command string `yaml:"command,omitempty"` Enable bool `yaml:"enable,omitempty"` Content string `yaml:"content,omitempty"` }
CloudUnit is a component of the cloudConfig configuration for CoreOS. It represents a CoreOS unit.
type Cluster ¶
type Cluster struct { pool.Mux `yaml:"-"` // HTTPClient is used to communicate to the reflowlet servers // running on the individual instances. In Cluster, this is done for // liveness/health checking. HTTPClient *http.Client `yaml:"-"` // Logger for cluster events. Log *log.Logger `yaml:"-"` // EC2 is the EC2 API instance through which EC2 calls are made. EC2 ec2iface.EC2API `yaml:"-"` // Authenticator authenticates the ECR repository that stores the // Reflowlet container. Authenticator ecrauth.Interface `yaml:"-"` // InstanceTags is the set of EC2 tags attached to instances created by this Cluster. InstanceTags map[string]string `yaml:"-"` // Labels is the set of labels that should be added as EC2 tags (for informational purpose only). Labels pool.Labels `yaml:"-"` // Spot is set to true when a spot instance is desired. Spot bool `yaml:"spot,omitempty"` // InstanceProfile is the EC2 instance profile to use for the cluster instances. InstanceProfile string `yaml:"instanceprofile,omitempty"` // SecurityGroup is the EC2 security group to use for cluster instances. SecurityGroup string `yaml:"securitygroup,omitempty"` // Subnets is the list of EC2 subnets ids based on which an appropriate subnet (for each AZ) will be determined. // That is, when Subnets is specified, the cluster will use ec2.DescribeSubnets API to determine AZ for each subnet. // When requesting a spot instance in a particular AZ, the appropriate subnet will be used. // If this list contains duplicate subnets for any AZ, behavior (of which subnet is used) is non-deterministic. Subnets []string `yaml:"subnets,omitempty"` // BootstrapImage is the URL of the image used for instance bootstrap. BootstrapImage string `yaml:"-"` // BootstrapExpiry is the maximum duration the bootstrap will wait for a reflowlet image after which it dies. BootstrapExpiry time.Duration `yaml:"-"` // ReflowVersion is the version of reflow binary compatible with this cluster. ReflowVersion string `yaml:"-"` // MaxPendingInstances is the maximum number of pending instances permitted. MaxPendingInstances int `yaml:"maxpendinginstances"` // MaxHourlyCostUSD is the maximum hourly cost of concurrent instances permitted (in USD). // A best effort is made to not go above this but races induced by multiple managers can increase the size // of the cluster beyond this limit. The limit is applied on maximum bid price and hence is an upper bound // on the actual incurred cost (which in practice would be much less). MaxHourlyCostUSD float64 `yaml:"maxhourlycostusd"` // DiskType is the EBS disk type to use. DiskType string `yaml:"disktype"` // InstanceSizeToDiskSpace is a mapping of EC2 instance size (e.g. xlarge) to starting disk space in GiB. InstanceSizeToDiskSpace InstanceSizeToDiskSpace `yaml:"instancesizetodiskspace"` // DiskSlices is the number of EBS volumes that are used. When DiskSlices > 1, // they are arranged in a RAID0 array to increase throughput. DiskSlices int `yaml:"diskslices"` // AMI is the VM image used to launch new instances. AMI string `yaml:"ami"` // Configuration for this Reflow instantiation. Used to provide configs to // EC2 instances. Configuration infra.Config `yaml:"-"` // AWS session Session *session.Session `yaml:"-"` // TaskDB implementation (if any) where rows are updated for newly created pools. TaskDB taskdb.TaskDB `yaml:"-"` // Public SSH keys. SshKeys []string `yaml:"sshkeys"` // AWS key name for launching instances. KeyName string `yaml:"keyname"` // Immortal determines whether instances should be made immortal. Immortal bool `yaml:"immortal,omitempty"` // NodeExporterMetricsPort determines whether to run a prometheus node_exporter daemon // on each Reflowlet. Setting a value runs the node_exporter daemon and configures it to // output prometheus metrics on the given port. Passing a non-zero value also adds an // additional route to the general Reflowlet server, such that metrics are made available // via proxy over the existing HTTPS connection and the following Reflow command: // $ reflow http https://${EC2_INST_PUBLIC_DNS}:9000/v1/node/metrics // If the user wishes to use other scrapers to fetch metrics from the Reflowlet over HTTP, // they may additionally choose to expose the port via the AWS settings for their Reflow // cluster. NodeExporterMetricsPort int `yaml:"nodeexportermetricsport,omitempty"` // CloudConfig is merged into the instance's cloudConfig before launching. CloudConfig cloudConfig `yaml:"cloudconfig"` // SpotProbeDepth is the probing depth for spot instance capacity checks. SpotProbeDepth int `yaml:"spotprobedepth,omitempty"` // Status is used to report cluster and instance status. Status *status.Group `yaml:"-"` // InstanceTypes defines the set of allowable EC2 instance types for // this cluster. If empty, all verified instance types are permitted. InstanceTypes []string `yaml:"instancetypes,omitempty"` // Name is the name of the cluster config, which defaults to defaultClusterName. // Multiple clusters can be launched/maintained simultaneously by using different names. Name string `yaml:"name,omitempty"` // contains filtered or unexported fields }
A Cluster implements a runner.Cluster backed by EC2. The cluster expands with demand. Instances are configured so that they shut down when they are idle on a billing boundary.
No local state is stored; state is inferred from labels managed by EC2. Cluster supports safely sharing state across many processes. In this case, the processes coordinate to maintain a shared cluster, where instances can be used by any of the constituent processes. In the case of Reflow, this means that multiple runs (single or batch) share the same cluster efficiently.
func (*Cluster) Allocate ¶
func (c *Cluster) Allocate(ctx context.Context, req reflow.Requirements, labels pool.Labels) (alloc pool.Alloc, err error)
Allocate reserves an alloc with within the resource requirement boundaries form this cluster. If an existing instance can serve the request, it is returned immediately; otherwise new instance(s) are spun up to handle the allocation.
func (*Cluster) Available ¶
Available returns the cheapest available instance specification that has at least the required resources.
func (*Cluster) CanAllocate ¶
CanAllocate returns whether this cluster can allocate the given amount of resources.
func (*Cluster) CheapestInstancePriceUSD ¶
func (*Cluster) ExportStats ¶
func (c *Cluster) ExportStats()
ExportStats exports the cluster stats to expvar.
func (*Cluster) Init ¶
func (c *Cluster) Init(tls tls.Certs, sess *session.Session, labels pool.Labels, bootstrapimage *infra2.BootstrapImage, reflowVersion *infra2.ReflowVersion, id *infra2.User, logger *log.Logger, ssh infra2.Ssh, mclient metrics.Client) error
Init implements infra.Provider
func (*Cluster) InstancePriceUSD ¶
func (*Cluster) Launch ¶
func (c *Cluster) Launch(ctx context.Context, spec InstanceSpec) ManagedInstance
Launch launches an EC2 instance based on the given spec and returns a ManagedInstance.
func (*Cluster) MaxAlloc ¶
MaxAlloc returns the max resources which can be obtained in a single alloc from this cluster.
func (*Cluster) Probe ¶
func (c *Cluster) Probe(ctx context.Context, instanceType string) (reflow.Resources, time.Duration, error)
Probe attempts to instantiate an EC2 instance of the given type and returns the available resources on it (as per its offers), a duration and an error. In case of a nil error: - the duration represents how long it took for a usable Reflowlet to come up on that instance type. - the resources represents how much actual resources are available/usable on that instance type. Note: Of course the above are based on a single data point. A non-nil error means that the reflowlet failed to come up on this instance type. The error could be due to context deadline, in case we gave up waiting for it to come up.
func (*Cluster) QueryTags ¶
QueryTags returns the list of tags to use to query for instances belonging to this cluster. This includes all InstanceTags that are set on any instance brought up by this cluster, and a "reflowlet:version" tag (set on the instance by the reflowlet once it comes up) to match the ReflowVersion of this cluster.
func (*Cluster) Start ¶
Start initializes the cluster and it should be called before any `pool.Pool` operations are performed on the cluster. Start uses the provided context to maintain the cluster and upon cancellation, the cluster will shutdown. Start can be called multiple times, but only parameters passed to the first call (which started the cluster) are relevant. Start takes a WaitGroup whose counter is updated to reflect background goroutines.
type InstanceSizeToDiskSpace ¶
InstanceSizeToDiskSpace is a mapping of EC2 instance size (e.g. xlarge) to starting disk space in GiB.
type InstanceSpec ¶
InstanceSpec is a specification representing an instance configuration.
func (InstanceSpec) Instance ¶
func (i InstanceSpec) Instance(id string) ManagedInstance
Instance creates a ManagedInstance for this specification with the given id.
type InstanceTypeStat ¶
type InstanceTypeStat struct { // InstanceType is an AWS EC2 instance type, i.e. r3.8xlarge InstanceType string // Count is the number of instances with the specified type. Count int }
InstanceTypeStat is a tuple that stores the count of instances of a specified type within an AWS EC2 cluster.
type ManagedCluster ¶
type ManagedCluster interface { // Launch launches an instance with the given specification. Launch(ctx context.Context, spec InstanceSpec) ManagedInstance // Refresh refreshes the managed cluster and returns a mapping of instance ID to instance type. Refresh(ctx context.Context) (map[string]string, error) // Available returns any available instance specification that can satisfy the need. // The returned InstanceSpec should be subsequently be 'Launch'able. Available(need reflow.Resources, maxPrice float64) (InstanceSpec, bool) // Notify notifies the managed cluster of the currently waiting and pending // amount of resources. Notify(waiting, pending reflow.Resources) // InstancePriceUSD returns the maximum hourly price bid in USD for the given instance type. InstancePriceUSD(typ string) float64 // CheapestInstancePriceUSD returns the minimum hourly price bid in USD for all known instance types. CheapestInstancePriceUSD() float64 }
ManagedCluster is a cluster which can be managed.
type ManagedInstance ¶
type ManagedInstance struct { InstanceSpec ID string }
ManagedInstance represents a concrete instance with a given specification and ID.
func (ManagedInstance) Valid ¶
func (m ManagedInstance) Valid() bool
Valid returns whether this is a valid instance (with a non-empty ID)
type Manager ¶
type Manager struct {
// contains filtered or unexported fields
}
Manager manages the cluster by fulfilling requests to allocate instances asynchronously, while delegating the actual launching of a new instance to the ManagedCluster. Once created, a manager needs to be initialized.
func NewManager ¶
func NewManager(c ManagedCluster, maxHourlyCostUSD float64, maxPendingInstances int, log *log.Logger) *Manager
NewManager creates a manager for the given managed cluster with the specified parameters.
func (*Manager) Allocate ¶
func (m *Manager) Allocate(ctx context.Context, req reflow.Requirements) <-chan struct{}
Allocate requests the Manager to allocate an instance for the given requirements Returns a channel the caller can wait on for confirmation request has been fulfilled.
func (*Manager) SetTimeouts ¶
SetTimeouts sets the various timeout durations (primarily used for integration testing).
type OverallStats ¶
type OverallStats struct { // InstanceIds is a slice of the instanceIds that are active within the ec2cluster maintained // by the current process. InstanceIds []string // TotalsByType is a slice of InstanceTypeStat tuples that define aggregations of the instances // in InstanceIds by instance type. TotalsByType []InstanceTypeStat }
OverallStats is a set of variables that describe instances within an ec2cluster and various aggregations of those instances (i.e. by instance type).
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
Package volume implements support for maintaining (EBS) volumes on an EC2 instance by watching the disk usage of the underlying disk device and resizing the EBS volumes whenever necessary based on provided parameters.
|
Package volume implements support for maintaining (EBS) volumes on an EC2 instance by watching the disk usage of the underlying disk device and resizing the EBS volumes whenever necessary based on provided parameters. |