Documentation ¶
Overview ¶
Package quota provides an implementation for server quotas which are backed by Redis.
Rationale ¶
Quotas are a way to restrict shared resource consumption in order to provide fairness and prevent abuse. The quota library implements a way to configure and track resource limits for users for application-specific resources.
We intend that this library be a 'good enough' implementation that it can serve the needs of many (if not all) LUCI services and provide additional common benefits (logging, metrics, administration ACLs/API/UI) so that each individual service doesn't need to re-invent these mechanisms.
The current implementation is based on Redis and is fully synchronous. There's a possibility in the future that we could extend the implementation to other datastores or to allow the application to make a tradeoff between accuracy and latency.
Data Model ¶
There are 2 different types of entities managed by the quota libary: Policies (grouped into a PolicyConfig) and Accounts. The library provides a variety of Operations which all work in terms of these entities.
Data Model - Entity identities ¶
All entities have an identity which is composed of the following 'atoms'. Some of these atoms need structure which is meaningful to the application. The quota library has a convention for such atoms called "ASIs" (Application specific identifiers). See that section for what/why. Note that all of these identifiers end up as Redis keys (or hash keys) one way or the other, so all the usual caveats around absurd key lengths apply here. However, Redis allows keys up to 512MB, so have fun...
Common identifier atoms:
- app_id - The app_id allows multiple logical applications to share the same Redis instance. This should reflect the service that the account or policy belongs to. For example this would allow a single deployment to have quota accounts/policies for an application "cv" and "rdb" in the same binary.
- realm - For administration purposes, Accounts and PolicyConfigs belong to a realm (though likely not the same one). Typically, PolicyConfigs will belong to a project's @project realm. Accounts will belong to realms which make sense in the context of the application. `realm` here is a global realm (i.e. `project:something`).
- resource_type (ASI) - A given Policy or Account can only deal in a single resource_type. This value only needs to make sense to the application.
- namespace (ASI) - Namespace allows the Application to segment a given realm into multiple sub-domains. For example, Buildbucket could use the namespace to indicate that a given Account is being used for a single builder within a bucket. This only needs to make sense to the application.
- name (ASI) - Name is the name of the entity. This only needs to make sense to the application.
Data Model - PolicyConfig ¶
ID: app_id ~ realm ~ version
A PolicyConfig is an immutable group (Redis Hash) of Policies. Typically this will be in a @project realm of some LUCI project, as current users will likely derive a PolicyConfig from some other LUCI project configuration.
The realm indicates which realm this PolicyConfig is administered under, but it doesn't need to (and likely will not) match the realm for Accounts using the Policies within it.
In the PolicyConfig ID, the `version` field is a content hash (starting with `$`), or manually supplied ("#" followed by an ASI). Once written, PolicyConfigs cannot be modified (but they can be purged). It's recommended to use the content hash versioning scheme (this will also do implicit deduplication when configs change without policy changes). However, some applications may find it more convenient to tie the PolicyConfig version to an external version identifier (like a git commit id of the overall configs), so manually versioning the PolicyConfigs is an option.
Purging PolicyConfigs results in the deletion of a PolicyConfig and should only be used for PolicyConfigs that the application knows are no longer in use. However, in the event that a PolicyConfig is purged while Accounts still reference it:
- Operations on those Accounts without supplying a new Policy reference will continue to use the snapshot of the policy stored in the Account. We could potentially make this produce a warning or error, however.
- Operations on Accounts that supply a new Policy reference must have that Policy exist, as usual, and it will replace the referenced/snapshotted policy in the Account.
Data Model - Policy ¶
Key (within a PolicyConfig): namespace ~ name ~ resource_type
A Policy is an immutable member of a PolicyConfig, and stores a numeric Default, Limit, Refill, and a Lifetime.
- Default - The value to set a previously non-existant Account to when first accessing it.
- Limit - The maximum value an Account can have.
- Options - Bit field indicating various options. Currently the only option is `ABSOLUTE_RESOURCE` which indicates that this policy constrains a resource which is managed exclusively by the application (for example, represents the current number of in-flight builds, etc.). This will disable the `quota.accounts.write` permission for accounts managed with this Policy.
- Lifetime - The number of seconds to wait before garbage collecting an Account after its last update. This is implemented with a Redis TTL which is refreshed on the Account each time it's written.
Refill is a numeric triple (see the "Refill Behavior" section for details of how refill works):
- Units - The number of units to add.
- Interval - The number of seconds in between fill events. Intervals are synchronized to UTC midnight + Offset. See the "Refill Behavior" section for a discussion on how Refill is implemented. Note that there is no cron or "stampede" from synchronizing refill events in this way. This must evenly divide 24 hours (86400 seconds).
- Offset - The number of seconds to offset UTC midnight to the 0th daily interval.
Data Model - Account ¶
ID: app_id ~ realm ~ namespace ~ name ~ resource_type
Accounts hold the balance of a specific owning identity for a specific resource. They contain:
- Balance - Current number of units held.
- LastUpdate - Time when this Account was last updated.
- LastRefill - Time when this Account was last refilled (always <= LastUpdate).
- LastPolicyChange - Time when the currently applied Policy was first set.
- PolicyConfig - Redis key for the versioned PolicyConfig last used for this Account.
- PolicyKey - Hash key (namespace ~ name ~ resource_type) in the PolicyConfig for the Policy last used for this Account.
- PolicyRaw - Raw encoded snapshot of the last-used policy for this Account. This is necessary to allow the quota library to interact with an Account under it's last-applied policy without needing to re-read the original policy (which is technically difficult to do in Redis scripts because they need to have all Redis keys supplied to them in advance of their execution).
Operations ¶
Operations combine a Policy with an Account, plus a delta.
Operations have:
- account - The ID of the account to apply to.
- policy - (optional) The PolicyConfig ID + Policy key to set on this Account.
- delta - An offset from the value specified by `relative_to`.
- relative_to - Enum with values CURRENT_BALANCE, ZERO, DEFAULT, and LIMIT.
- options -
- IGNORE_POLICY_BOUNDS - This allows `$relative_to + delta` to bring balance outside of the Policy's (0,limit) range.
An Operation is applied by:
- Creating the Account if it is missing, populating it with the provided Policy default, applying any refill to the existing Account balance under the Account's existing policy.
- If the Operation includes a Policy, setting that Policy on the Account.
- Calculating the new balance and checking if it is within the current/new Policy bounds.
- Saving the new Account balance, policy, and resetting the Account TTL.
Operations can fail in one of three ways:
- FAIL_OUT_OF_BOUNDS - The Operation would have brought the Account out of (0, Policy.Limit), and options=IGNORE_POLICY_BOUNDS was unset.
- FAIL_UNKNOWN_POLICY - The Operation included a policy which wasn't loaded.
- FAIL_MISSING_ACCOUNT - The Operation referred to an Account, but also didn't set a policy, meaning that the Operation couldn't create the Account.
NOTE: For Accounts where the balance is ALREADY out bounds, Operations which bring the balance closer to in-bounds ARE allowed. For example, a delta CURRENT_BALANCE+1 would be allowed for an Account whose balance was -10, and a delta CURRENT_BALANCE-10 would be allowed for an Account whose balance was 19 with a limit of 10.
There is also a Get operation which ONLY reads the data, returning the full Account data and also the projected value (e.g. after refills). This operation does NOT change the Account at all (i.e. last_refill, TTL, etc. are all left as-is).
Application-specific identifiers (ASIs) ¶
The quota library has several application-specific identifiers (ASIs). These ASIs end up ~verbatim in Redis as row keys. This means that your storage costs and lookup performance will be proportional to their length.
The quota libary reserves the character "~" for partitioning ASIs when synthesizing a full Redis key.
Additionally, two characters will be treated specially as a convention:
- "|" is available to separate sections within an ASI.
- "{", if the first character in an ASI section, indicates that the remainder of that section is encoded with ascii85 (an encoding which conveniently excludes "~", "|", and "{"). Functions in this library which attempt to do this interpretation will return the raw string instead of failing (e.g. if you had `{z` in a section, it would be returned as `{z` rather than as an error).
The quota library provides functions to encode/decode a series of arbitrary section strings to/from a single ASI string.
The quota library may use "|" as a way to group related keys together when displaying a large collection of quota Account or Policy data. Think of it similarly to how GCS treats "/". It's a visual delimiter, but the underlying service doesn't really care if you use it or not. Similarly, sections starting with '{' will attempt to decode in certain contexts (like the UI), but if decoding fails it will return the original string. If your application dosen't care about this functionality at all, it's free to use any string it likes as an ASI, as long as it doesn't contain `~`.
Refill Behavior ¶
Refills in the quota library are intended to mimic the behavior of a cron job which runs every second, scanning all Accounts, seeing if their Interval is past and refilling them.
However, such an implementation would be terribly slow. Instead, the quota library remembers the policy details for each account and then when interacting with the Account as part of an Operation, this will refill based on the real elapsed time under the previous Policy.
Refills are synchronized to UTC plus an offset. This means if you specify 17 units with an interval of "21600" (i.e. 6 hours), and an offset of 0, then each 6 hours after UTC midnight, 17 units would be added to the account. If the account was created at, say, 0740 UTC, then the next refill event would occur at 1200 UTC.
Offset allows you to 'rotate' this cycle so that a given policy's "midnight" occurs at a different time of day. (NOTE: Theoretically this offset could be per-Account rather than per-Policy. If this becomes a necessary usecase, it wouldn't be hard to add, but for now we're keeping it simple).
Please also refer to "Implementation notes - Refill Interval" and "Implementation notes - Refill Synchronization" for a discussion on why we picked this Refill system vs. a simpler units/second alternative and why we tie refills to the wall clock time.
Behavior when switching Policies ¶
Over time, it is likely that a single Account will go through multiple different Policies which apply to it, or where those Policies change parameters over time.
Account names should always be stable, comprising a who/what/where of a resource. When policies shift for an Account, the quota library will maintain the previous balance of the Account, except that no Refill will take place if the Account is over its limit. Additionally, no matter how far out of spec an Account is, it will always be permitted to make an over-limit account smaller, or an under-zero account larger.
So, say an account had a policy which had a limit of 20, with a balance of 18, and switched to a policy with a balance of 15. It would maintain its balance of 18 until debited, but any positive refill policy would have no effect.
Access control and Administration ¶
The quota library implements an administration service API. This is an auxilliary API to read/write the values manipulated by the quota library, to be used for debugging or manual intervention (rather than directly poking the underlying Redis data).
The `self` binding context attribute has the value "1" if the Account ID's identity field matches the current auth identity, "0" otherwise.
Access via this service is granted via realm permissions:
- quota.accounts.read - Allows reading single accounts within a realm. Binding context: {app_id, resource_type, namespace, self}
- quota.accounts.list - Allows listing accounts Binding context: {app_id, resource_type, namespace}
- quota.accounts.write - Allows modifying accounts. Note that this only applies to accounts which do not have the option ABSOLUTE_RESOURCE. Binding context: {app_id, resource_type, namespace, self}
- quota.policies.read - Allows reading policy contents. Binding context: {app_id}
- quota.policies.write - Allows writing new content-addressed policy configs. Binding context: {app_id}
- quota.policies.overrideVersion - If granted in conjunction with `quota.policies.write`, allows writing new manually-versioned policy configs. Binding context: {app_id}. Note that manually-versioned policy configs are not verifiable by the quota library and could allow users with this permission to 'poison' a quota policy version.
- quota.policies.purge - Allows perging PolicyConfigs. Binding context: {app_id}.
Permission checks require one of:
- hasPermission(perm, operation_realm) OR
- hasPermission(perm, "@internal:<service-app-id>")
That is, internal permissions can be granted to service deployment Admins. Additionally, permissions granted in this realm will ignore the ABSOLUTE_RESOURCE flag on accounts, becuase it's presumed that service deployment Admins understand the nuances of manually adjusting such Accounts.
NOTE: These access controls ONLY apply to requests via the Administration service API. Interaction with the quotas via the Go API do not do any access checking, because it is assumed that the application has already done appropriate access checks before computing the Accounts/Policies to interact with.
Implementation notes - Refill Interval ¶
Initially the Quota library implemented a "units/second" refill system. This made the implementation nice due to its simplicity, but had two noticeable drawbacks:
- Low quantity quotas (e.g. builds per day) were difficult to express naturally (for example, the application would have to have accounts in fractional builds, like 100,000 == one build).
- Even if the application expressed account values in this way, this leads to an effectively "analog" replenismhent system which would lead to mistakes when setting quotas.
Consider the case where you want to restrict users to "10 builds per day". You first make the accounts hold thousandths of a build, and then set a policy with (limit=1000000, refill_each_sec=11). Ignoring the fact that the refill should actually be something like 11.574, we've basically achieved what we want, right? A user can only run 10 builds (a bit less) per day.
Not quite. Consider that the user can wait until their quota is full (10 builds) and then they:
- Run 10 builds in hour 0
- Run one build every ~2 hours for the next 24 hours.
Oops... our 10/day quota actually allows the user to burst up to 19/day. Mondays are gonna be spicy.
Another aspect of the current implementation is that the Interval MUST cleanly divide one day. This allows the Interval to have a daily cycle and reduces the possible edge cases when switching policies for an Acccount where the Policies have different refill periods. Otherwise, oddball intervals (like 13h) would skew by an hour each day, and when we eventually switch policies, the Account would lose an unpredictable amount of refill time.
Implementation notes - Refill Synchronization ¶
Quota refills are tricky; originally we started the clock at account creation time, but realized this would lead to two issues:
- Every quota account would refresh at seemingly-random times, which makes debugging more difficult. This would not be beneficial for 'load distribution' in a system (it should explicitly use short term quotas or some othe rate limiting techniques instead).
- This would lead to very difficult to reason-about behaviors when policies change for a given account.
In the case of policy changes, the only sensible thing to do while maintaining the interval based refill events would be to reset the refill timer when changing policies on an account. However, for Refill policies with long intervals, this could lead to artifacts where users are inexplicably starved for quota. Consider a situation where a user is allowed 10 builds per day. They exhaust their quota at hour 23 of the day and complain to a trooper who then moves them to a higher-tier policy group with 20 builds per day.
However, when hour 24 rolls around, the user's account not only doesn't get 20 builds added to it, it doesn't even get the original 10. Instead the user has to wait an ADDITIONAL 24h before their quota replenishes.
Synchronizing refill events significantly improves the predictability of the system here.
Implementation notes - Deduplication ¶
The quota library has a simple deduplication scheme which is indended to prevent accidentally applying Operations multiple times (for example, applying a Op(-10) operation twice when you only wanted to apply it once could be pretty bad).
When any actor interacts with the Quota library (either via the Go interface or the Administration API), they provide a request ID. The quota library then calculates if ALL of the Operations in the request can proceed with the current Account state, and, if so, applies ALL of the Operations atomically*, followed by recording the RequestID into Redis with a TTL (defaulting to 2 hours), a hash of the requested operations, plus the returned value for the Account balances after applying all of the Operations. If a subsequent request comes in with the same RequestID, the hash of the Operations is checked, and if it matches the stored value, the original result will be returned without error.
(* I put the scary asterisk on atomically, because _as far as I can tell_, EVAL scripts in Redis are either fully applied, or not applied at all. However the statements in the docs aren't as strong as I'd like to this effect. The docs do state that EVAL (or FUNCTIONs) is our best bet.)
Supplying a different set of Operations with the same RequestID is an error, and the request will be rejected.
Where this departs from "normal" deduplication is that _negative_ (error) results are NOT recorded; That is, if you attempt to debit an account "A" by 1 unit, but the balance is currently 0, this will return an "underflow" error, but the RequestID will not be consumed (so retrying this exact same request later may succeed, if the balance of "A" has risen above 1.
We speculate that this mode is more intuitive, since many of the places we expect applications to interact with the quota library are attempting to make rapid, otherwise stateless, decisions about what to do next, where generating the RequestID deterministically in the context of that decision is convenient. If we stored the rejection via the RequestID, it would require these stateless invocations to likely store the fact that a RequestID was consumed, or to pick randomized RequestIDs (which then gets you in trouble when multiple processes are attempting to make the same decision and would only fail out on a transaction after communicating intent to the quota service).
Implementation notes - Redis encoding ¶
This library makes use of `msgpack` to encode both Accounts and Policies in Redis. Unfortunately, because we need to implement quota manipulation in `lua`, regular protobuf wasn't an option for these.
See the go.chromium.org/luci/common/proto/msgpackpb for documentation on this encoding form.
This encoding form intends to preserve protobuf's backwards compatibility semantics, which (hopefully) will make forward schema migrations easy to implement without requiring total cache eviction.
Implementation notes - Debugging lua code ¶
I don't have any great strategy for this, but I did add a `DUMP` global function which is available in both `internal/luatest` as well as `quotatestmonkeypatch`. This will dump (print) all arguments, and will serialize any tables given to it with `cjson.encode`, which is usually good enough for quick debugging.
Index ¶
- Variables
- func ApplyOps(ctx context.Context, requestID string, requestTTL *durationpb.Duration, ...) (*quotapb.ApplyOpsResponse, error)
- func GetAccounts(ctx context.Context, accounts []*quotapb.AccountID) (*quotapb.GetAccountsResponse, error)
- func NewModule(opts *ModuleOptions) module.Module
- func NewModuleFromFlags() module.Module
- type Application
- func (a *Application) AccountID(realm, namespace, name, resourceType string) *quotapb.AccountID
- func (a *Application) LoadPoliciesAuto(ctx context.Context, realm string, cfg *quotapb.PolicyConfig) (cid *quotapb.PolicyConfigID, err error)
- func (a *Application) LoadPoliciesManual(ctx context.Context, realm string, version string, cfg *quotapb.PolicyConfig) (*quotapb.PolicyConfigID, error)
- type ApplicationOptions
- type ModuleOptions
Constants ¶
This section is empty.
Variables ¶
var AssembleASI = quotakeys.AssembleASI
AssembleASI will return an Application-Specified-Identifier (ASI) with the given sections.
Sections are assembled with a "|" separator verbatim, unless the section contains a "|", "~" or begins with "{". In this case the section will be encoded with ascii85 and inserted to the final string with a "{" prefix character.
var DecodeASI = quotakeys.DecodeASI
DecodeASI will return the sections within an Application-Specified-Identifier (ASI), decoding any which appear to be ascii85-encoded (i.e. those prefixed with a "{").
If a section has the "{" prefix, but doesn't correctly decode, it's returned verbatim.
var ErrQuotaApply = errors.New("quota.Apply had errors")
ErrQuotaApply is returned by Apply when the updates were not applied.
See the returned ApplyOpsResponse for details.
var ModuleName = module.RegisterName("go.chromium.org/luci/server/quota")
ModuleName is the globally-unique name for this module. Useful for registering this module as a dependency of other modules.
var UpdateAccountsScript = lua.UpdateAccountsScript
UpdateAccountsScript is a reference to the lua script used by the quota library. This is only a public symbol in order to patch it with the quotatestmonkeypatch library.
Functions ¶
func ApplyOps ¶
func ApplyOps(ctx context.Context, requestID string, requestTTL *durationpb.Duration, ops []*quotapb.Op) (*quotapb.ApplyOpsResponse, error)
ApplyOps combines several quota operations into one atomic action with a single requestID.
The requestID won't be consumed until this returns success, and once it's successful, it will continue to return success without any quota changes for requestTTL. If requestTTL is not set, the TTL defaults to 2 hours. The requestID is tied to auth.CurrentIdentity. If requestID is empty, this operation is not idempotent.
Policies must already be loaded with LoadPolicies.
func GetAccounts ¶
func GetAccounts(ctx context.Context, accounts []*quotapb.AccountID) (*quotapb.GetAccountsResponse, error)
GetAccounts fetches the list of requested accounts. If the account does not exist, GetAccountsResponse.Account[i].Account is left unset. TODO(aravindvasudev): Implement logic to compute Account.ProjectedBalance.
func NewModule ¶
func NewModule(opts *ModuleOptions) module.Module
NewModule returns a module.Module for the quota library initialized from the given *ModuleOptions.
func NewModuleFromFlags ¶
NewModuleFromFlags returns a module.Module for the quota library which can be initialized from command line flags.
Types ¶
type Application ¶
type Application struct {
// contains filtered or unexported fields
}
func Register ¶
func Register(appID string, ao *ApplicationOptions) *Application
func (*Application) AccountID ¶
func (a *Application) AccountID(realm, namespace, name, resourceType string) *quotapb.AccountID
AccountID is a convenience method to make an AccountID tied to this application.
Will panic if resourceType is not registered for this Application.
func (*Application) LoadPoliciesAuto ¶
func (a *Application) LoadPoliciesAuto(ctx context.Context, realm string, cfg *quotapb.PolicyConfig) (cid *quotapb.PolicyConfigID, err error)
LoadPoliciesAuto ensures that the given policy config is ingested with a content-hash version for the given Application in `realm`.
If a policy config already exists for `(cfg.id, realm, version)`, this returns immediately without checking its content.
Returns the calculated version hash.
func (*Application) LoadPoliciesManual ¶
func (a *Application) LoadPoliciesManual(ctx context.Context, realm string, version string, cfg *quotapb.PolicyConfig) (*quotapb.PolicyConfigID, error)
LoadPoliciesManual ensures that the given policy config is uploaded at `version` for the given Application in `realm`.
If a policy config already exists for `(cfg.id, realm, version)`, this returns immediately without checking its content. It is the application's responsibility to ensure that (namespace, version) always refers to the same `cfg` contents.
Version must not contain "$" or "~".
type ApplicationOptions ¶
type ApplicationOptions struct { // ResourceTypes enumerates all the ResourceTypes this application can use. // // Policies and Accounts both specify a single resource type, and these must // match. I.e. you cannot use a policy for `qps` to manage a `storage_bytes` // account. ResourceTypes []string }
type ModuleOptions ¶
type ModuleOptions struct { }
ModuleOptions is a set of configuration options for the quota module.
func (*ModuleOptions) Register ¶
func (o *ModuleOptions) Register(f *flag.FlagSet)
Register adds command line flags for these module options to the given *flag.FlagSet. Mutates module options by initializing defaults.
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
examples
|
|
ratelimit
Package main contains a binary demonstrating how to use the server/quota module to implement rate limiting for requests.
|
Package main contains a binary demonstrating how to use the server/quota module to implement rate limiting for requests. |
internal
|
|
datatool
Datatool is a program which allows you to encode/decode quotapb protobuf messages to/from a variety of codecs.
|
Datatool is a program which allows you to encode/decode quotapb protobuf messages to/from a variety of codecs. |
lua
Package lua is generated by go.chromium.org/luci/tools/cmd/assets.
|
Package lua is generated by go.chromium.org/luci/tools/cmd/assets. |
quotakeys
Package quotakeys has utility functions for generating internal quota Redis keys.
|
Package quotakeys has utility functions for generating internal quota Redis keys. |
Package quotapb exports proto definitions required by the quota library.
|
Package quotapb exports proto definitions required by the quota library. |
Package quotatestmonkeypatch should be imported for its side-effects in tests.
|
Package quotatestmonkeypatch should be imported for its side-effects in tests. |