Documentation ¶
Overview ¶
Package algorithm contains a running-weighted-average calculator for ratelimits, and some associated types to prevent accidental confusion between the various floats/ints/etc involved.
This package is intentionally unaware of RPC or any desired RPS limits, it just tracks the observed rates of requests and returns proportions, so other code can determine RPS values.
This is both to simplify testing and to keep its internals as fast as possible, as it is relatively CPU heavy and needs to be quick to prevent a snowballing backlog of concurrent updates and requests for aggregated data.
How this fits in the global ratelimiting system ¶
Going back to the github.com/uber/cadence/common/quotas/global package diagram: each aggregating host will have one or more instances of this weight-calculator, and it will receive a shard worth of request data.
Though this package has no direct connection to RPC structures, the request data is expected to closely match the argument to RequestWeighted.Update, and the response data like the return value of RequestWeighted.HostWeights.
This makes aggregating hosts intentionally very simple outside of this package: they essentially just forward the request and response, multiplying by dynamicconfig intended-ratelimit values (which occurs outside the mutex, to minimize contention).
Expected use ¶
Limiting hosts collect metrics and submit it to an aggregating host via RequestWeighted.Update, through [some kind of RPC setup].
Once updated, the aggregating host can get the RequestWeighted.HostWeights for that host's ratelimits (== the updated limits), multiply those 0..1 weights by the dynamicconfig configured RPS, and return them to the limiting host that triggered the request.
If there are unused RPS remaining, the aggregating host *may* increase the RPS it returns by [some amount], to allow the limiting host to pre-emptively allow more requests than is "fair" before the next update. Even if it does not, an increase in attempted usage will increase that limiter's weight on the next update cycle, so this is mostly intended as a tool for reducing incorrectly-rejected requests when a ratelimit's usage is well below its allowed limit.
Dealing with expired data ¶
As user calls change, or the aggregating-host ring changes, Limit keys may become effectively unused in an instance. During normal use, any accessed Limit will clean up "expired" data if it is found, and there is essentially no ongoing "upkeep" cost for an un-accessed Limit (aside from memory).
Hopefully this will be sufficient to keep memory use and performance reasonable.
If it is not, a trivial goroutine to periodically call RequestWeighted.GC will clear *all* old data. Every minute or so should be more than sufficient. This method returns some simple metrics about how much data exists / was removed, so it can be reported to help us estimate how necessary it is in practice.
Dealing with excessive contention ¶
In large clusters, there will be likely be many update-requests, many Limit keys, and more data to process to return responses (more hosts per limit). At some point this could become a bottleneck, preventing timely updates.
On even our largest internal clusters, I do not believe we will run that risk. At least, not with the planned frontend-only limits. `pprof` should be able to easily validate this and let us better estimate headroom for adding more in the future.
If too much contention does occur, there are 3 relatively simple mitigations:
- turn it completely off, go back to host-local limits
- slow down the update frequency
- use N instances and shard keys across them
1 is pretty obvious and has clear behavior, though it means degrading behavior for our imbalanced-load clusters.
2 is essentially the only short-term and dynamically-apply-able option that retains any of the behavior we want. This will impact how quickly the algorithm converges, so you may also want to adjust the new-data weight to be higher, though "how much" depends on what kind of behavior you want / can tolerate.
3 offers a linear contention improvement and should be basically trivial to build: create N instances instead of 1, and shard the Limit keys to each instance. Since this is mostly CPU bound and each one would be fully independent, making `GOMAXPROCS` instances is an obvious first choice, and this does not need to be dynamically reconfigured at runtime so there should be no need to build a "smart" re-balancing / re-sharding system of any kind.
Tests contain a single-threaded benchmark and laptop-local results for estimating contention, and for judging if changes will help or hinder, but real-world use and different CPUs will of course be somewhat different. Personally I expect "few mutexes, GOMAXPROCS instances" is roughly ideal for CPU throughput with the current setup, but an entirely different internal structure might exceed it.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Config ¶
type Config struct { // How much each update should be weighted vs prior data. // Must be between 0 and 1, recommend starting with 0.5 (4 updates until data has <10% influence) NewDataWeight dynamicconfig.FloatPropertyFn // Expected time between updates. Should match the cluster's config for how often limiters check in, // i.e. this should probably be the same dynamic config value, updated at / near the same time. UpdateInterval dynamicconfig.DurationPropertyFn // How long to wait before considering a host-limit's RPS usage "probably inactive", rather than // simply delayed. // // Should always be larger than UpdateInterval, as less is meaningless. Values are reduced based on // missed UpdateInterval multiples, not DecayAfter. // Unsure about a good default (try 2x UpdateInterval?), but larger numbers mean smoother behavior // but longer delays on adjusting to hosts that have disappeared or stopped receiving requests. DecayAfter dynamicconfig.DurationPropertyFn // How much time can pass without receiving any update before completely deleting data. // // Due to ever-reducing weight after DecayAfter, this is intended to reduce calculation costs, // not influence behavior / weights returned. Even extremely-low-weighted hosts will still be retained // as long as they keep reporting "in use" (e.g. 1 rps used out of >100,000 is fine and will be tracked). // // "Good" values depend on a lot of details, but >=10*UpdateInterval seems reasonably safe for a // NewDataWeight of 0.5, as the latest data will be reduced to only 0.1% and may not be worth keeping. GcAfter dynamicconfig.DurationPropertyFn }
type HostWeight ¶
type HostWeight float64
HostWeight is a float between 0 and 1, showing how much of a ratelimit should be allocated to a host.
type Identity ¶
type Identity string
Identity is an arbitrary (stable) identifier for a "limiting" host.
type Metrics ¶
type Metrics struct { // HostLimits is the number of per-host limits that remain after cleanup HostLimits int // Limits is the number of cross-host limits that remain after cleanup Limits int // RemovedHostLimits is the number of per-host limits that were removed as part of this GC pass RemovedHostLimits int // RemovedLimits is the number of cross-host limits that were removed as part of this GC pass RemovedLimits int }
Metrics reports overall counts discovered as part of garbage collection.
This is not necessarily worth maintaining verbatim, but while it's easy to collect it might give us some insight into overall behavior.
type PerSecond ¶
type PerSecond float64
PerSecond represents already-scaled-to-per-second RPS values
type RequestWeighted ¶
type RequestWeighted interface { // Update load-data for this host's requests, given a known elapsed time spent accumulating this load info. // // Elapsed time must be non-zero, but is not otherwise constrained. Update(params UpdateParams) error // HostWeights returns the per-[Limit] weights for all requested + known keys for this Identity, // as well as the Limit's overall used RPS (to decide RPS to allow for new hosts). HostWeights(host Identity, limits []Limit) (weights map[Limit]HostWeight, usedRPS map[Limit]PerSecond, err error) // GC can be called periodically to pre-emptively prune old ratelimits. // // Limit keys that are accessed normally will automatically garbage-collect themselves and old host data, // and an unused limit only costs memory, so this may prove to be unnecessary. GC() (Metrics, error) }
RequestWeighted returns aggregated RPS and distribution data for its Limit keys, based on how many requests each Identity has received.
func New ¶
func New(cfg Config) (RequestWeighted, error)
New returns a concurrency-safe host-weight aggregator.
This instance is effectively single-threaded, but a small sharding wrapper should allow better concurrent throughput if needed (bound by CPU cores, as it's moderately CPU-costly).