ristretto

package module
v0.0.0-...-d1403e3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 13, 2024 License: Apache-2.0 Imports: 10 Imported by: 1

README

Ristretto

Go Doc ci-ristretto-tests ci-ristretto-lint Coverage Status Go Report Card

Ristretto is a fast, concurrent cache library built with a focus on performance and correctness.

The motivation to build Ristretto comes from the need for a contention-free cache in Dgraph.

Features

  • High Hit Ratios - with our unique admission/eviction policy pairing, Ristretto's performance is best in class.
    • Eviction: SampledLFU - on par with exact LRU and better performance on Search and Database traces.
    • Admission: TinyLFU - extra performance with little memory overhead (12 bits per counter).
  • Fast Throughput - we use a variety of techniques for managing contention and the result is excellent throughput.
  • Cost-Based Eviction - any large new item deemed valuable can evict multiple smaller items (cost could be anything).
  • Fully Concurrent - you can use as many goroutines as you want with little throughput degradation.
  • Metrics - optional performance metrics for throughput, hit ratios, and other stats.
  • Simple API - just figure out your ideal Config values and you're off and running.

Status

Ristretto is production-ready. See Projects using Ristretto.

Table of Contents

Usage

Example
package main

import (
	"fmt"

	"github.com/dgraph-io/ristretto"
)

func main() {
	cache, err := ristretto.NewCache(&ristretto.Config{
		NumCounters: 1e7,     // number of keys to track frequency of (10M).
		MaxCost:     1 << 30, // maximum cost of cache (1GB).
		BufferItems: 64,      // number of keys per Get buffer.
	})
	if err != nil {
		panic(err)
	}

	// set a value with a cost of 1
	cache.Set("key", "value", 1)

	// wait for value to pass through buffers
	cache.Wait()

	// get value from cache
	value, found := cache.Get("key")
	if !found {
		panic("missing value")
	}
	fmt.Println(value)

	// del value from cache
	cache.Del("key")
}
Config

The Config struct is passed to NewCache when creating Ristretto instances (see the example above).

NumCounters int64

NumCounters is the number of 4-bit access counters to keep for admission and eviction. We've seen good performance in setting this to 10x the number of items you expect to keep in the cache when full.

For example, if you expect each item to have a cost of 1 and MaxCost is 100, set NumCounters to 1,000. Or, if you use variable cost values but expect the cache to hold around 10,000 items when full, set NumCounters to 100,000. The important thing is the number of unique items in the full cache, not necessarily the MaxCost value.

MaxCost int64

MaxCost is how eviction decisions are made. For example, if MaxCost is 100 and a new item with a cost of 1 increases total cache cost to 101, 1 item will be evicted.

MaxCost can also be used to denote the max size in bytes. For example, if MaxCost is 1,000,000 (1MB) and the cache is full with 1,000 1KB items, a new item (that's accepted) would cause 5 1KB items to be evicted.

MaxCost could be anything as long as it matches how you're using the cost values when calling Set.

BufferItems int64

BufferItems is the size of the Get buffers. The best value we've found for this is 64.

If for some reason you see Get performance decreasing with lots of contention (you shouldn't), try increasing this value in increments of 64. This is a fine-tuning mechanism and you probably won't have to touch this.

Metrics bool

Metrics is true when you want real-time logging of a variety of stats. The reason this is a Config flag is because there's a 10% throughput performance overhead.

OnEvict func(hashes [2]uint64, value interface{}, cost int64)

OnEvict is called for every eviction.

KeyToHash func(key interface{}) [2]uint64

KeyToHash is the hashing algorithm used for every key. If this is nil, Ristretto has a variety of defaults depending on the underlying interface type.

Note that if you want 128bit hashes you should use the full [2]uint64, otherwise just fill the uint64 at the 0 position and it will behave like any 64bit hash.

Cost func(value interface{}) int64

Cost is an optional function you can pass to the Config in order to evaluate item cost at runtime, and only for the Set calls that aren't dropped (this is useful if calculating item cost is particularly expensive and you don't want to waste time on items that will be dropped anyways).

To signal to Ristretto that you'd like to use this Cost function:

  1. Set the Cost field to a non-nil function.
  2. When calling Set for new items or item updates, use a cost of 0.

Benchmarks

The benchmarks can be found in https://github.com/dgraph-io/benchmarks/tree/master/cachebench/ristretto.

Hit Ratios

This trace is described as "disk read accesses initiated by a large commercial search engine in response to various web search requests."

Database

This trace is described as "a database server running at a commercial site running an ERP application on top of a commercial database."

Looping

This trace demonstrates a looping access pattern.

CODASYL

This trace is described as "references to a CODASYL database for a one hour period."

Throughput

All throughput benchmarks were ran on an Intel Core i7-8700K (3.7GHz) with 16gb of RAM.

Mixed

Read

Write

Projects Using Ristretto

Below is a list of known projects that use Ristretto:

  • Badger - Embeddable key-value DB in Go
  • Dgraph - Horizontally scalable and distributed GraphQL database with a graph backend
  • Vitess - Database clustering system for horizontal scaling of MySQL
  • SpiceDB - Horizontally scalable permissions database

FAQ

How are you achieving this performance? What shortcuts are you taking?

We go into detail in the Ristretto blog post, but in short: our throughput performance can be attributed to a mix of batching and eventual consistency. Our hit ratio performance is mostly due to an excellent admission policy and SampledLFU eviction policy.

As for "shortcuts," the only thing Ristretto does that could be construed as one is dropping some Set calls. That means a Set call for a new item (updates are guaranteed) isn't guaranteed to make it into the cache. The new item could be dropped at two points: when passing through the Set buffer or when passing through the admission policy. However, this doesn't affect hit ratios much at all as we expect the most popular items to be Set multiple times and eventually make it in the cache.

Is Ristretto distributed?

No, it's just like any other Go library that you can import into your project and use in a single process.

Documentation

Overview

Ristretto is a fast, fixed size, in-memory cache with a dual focus on throughput and hit ratio performance. You can easily add Ristretto to an existing system and keep the most valuable data where you need it.

This package includes multiple probabalistic data structures needed for admission/eviction metadata. Most are Counting Bloom Filter variations, but a caching-specific feature that is also required is a "freshness" mechanism, which basically serves as a "lifetime" process. This freshness mechanism was described in the original TinyLFU paper 1, but other mechanisms may be better suited for certain data distributions.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Cache

type Cache[K Key, V any] struct {

	// Metrics contains a running log of important statistics like hits, misses,
	// and dropped items.
	Metrics *Metrics
	// contains filtered or unexported fields
}

Cache is a thread-safe implementation of a hashmap with a TinyLFU admission policy and a Sampled LFU eviction policy. You can use the same Cache instance from as many goroutines as you want.

func NewCache

func NewCache[K Key, V any](config *Config[K, V]) (*Cache[K, V], error)

NewCache returns a new Cache instance and any configuration errors, if any.

func (*Cache[K, V]) Clear

func (c *Cache[K, V]) Clear()

Clear empties the hashmap and zeroes all cachePolicy counters. Note that this is not an atomic operation (but that shouldn't be a problem as it's assumed that Set/Get calls won't be occurring until after this).

func (*Cache[K, V]) Close

func (c *Cache[K, V]) Close()

Close stops all goroutines and closes all channels.

func (*Cache[K, V]) Del

func (c *Cache[K, V]) Del(key K)

Del deletes the key-value item from the cache if it exists.

func (*Cache[K, V]) Get

func (c *Cache[K, V]) Get(key K) (V, bool)

Get returns the value (if any) and a boolean representing whether the value was found or not. The value can be nil and the boolean can be true at the same time. Get will not return expired items.

func (*Cache[K, V]) GetTTL

func (c *Cache[K, V]) GetTTL(key K) (time.Duration, bool)

GetTTL returns the TTL for the specified key and a bool that is true if the item was found and is not expired.

func (*Cache[K, V]) MaxCost

func (c *Cache[K, V]) MaxCost() int64

MaxCost returns the max cost of the cache.

func (*Cache[K, V]) ResetTTL

func (c *Cache[K, V]) ResetTTL(key K, ttl time.Duration) bool

ResetTTL updates the TTL for the specified key and returns a bool that is true if the item was found and had an expiration set previously.

func (*Cache[K, V]) Set

func (c *Cache[K, V]) Set(key K, value V, cost int64) bool

Set attempts to add the key-value item to the cache. If it returns false, then the Set was dropped and the key-value item isn't added to the cache. If it returns true, there's still a chance it could be dropped by the policy if its determined that the key-value item isn't worth keeping, but otherwise the item will be added and other items will be evicted in order to make room.

To dynamically evaluate the items cost using the Config.Coster function, set the cost parameter to 0 and Coster will be ran when needed in order to find the items true cost.

func (*Cache[K, V]) SetWithTTL

func (c *Cache[K, V]) SetWithTTL(key K, value V, cost int64, ttl time.Duration) bool

SetWithTTL works like Set but adds a key-value pair to the cache that will expire after the specified TTL (time to live) has passed. A zero value means the value never expires, which is identical to calling Set. A negative value is a no-op and the value is discarded.

func (*Cache[K, V]) UpdateMaxCost

func (c *Cache[K, V]) UpdateMaxCost(maxCost int64)

UpdateMaxCost updates the maxCost of an existing cache.

func (*Cache[K, V]) Wait

func (c *Cache[K, V]) Wait()

Wait blocks until all buffered writes have been applied. This ensures a call to Set() will be visible to future calls to Get().

type Config

type Config[K Key, V any] struct {
	// NumCounters determines the number of counters (keys) to keep that hold
	// access frequency information. It's generally a good idea to have more
	// counters than the max cache capacity, as this will improve eviction
	// accuracy and subsequent hit ratios.
	//
	// For example, if you expect your cache to hold 1,000,000 items when full,
	// NumCounters should be 10,000,000 (10x). Each counter takes up roughly
	// 3 bytes (4 bits for each counter * 4 copies plus about a byte per
	// counter for the bloom filter). Note that the number of counters is
	// internally rounded up to the nearest power of 2, so the space usage
	// may be a little larger than 3 bytes * NumCounters.
	NumCounters int64
	// MaxCost can be considered as the cache capacity, in whatever units you
	// choose to use.
	//
	// For example, if you want the cache to have a max capacity of 100MB, you
	// would set MaxCost to 100,000,000 and pass an item's number of bytes as
	// the `cost` parameter for calls to Set. If new items are accepted, the
	// eviction process will take care of making room for the new item and not
	// overflowing the MaxCost value.
	MaxCost int64
	// BufferItems determines the size of Get buffers.
	//
	// Unless you have a rare use case, using `64` as the BufferItems value
	// results in good performance.
	BufferItems int64
	// Metrics determines whether cache statistics are kept during the cache's
	// lifetime. There *is* some overhead to keeping statistics, so you should
	// only set this flag to true when testing or throughput performance isn't a
	// major factor.
	Metrics bool
	// OnEvict is called for every eviction and passes the hashed key, value,
	// and cost to the function.
	OnEvict func(item *Item[V])
	// OnReject is called for every rejection done via the policy.
	OnReject func(item *Item[V])
	// OnExit is called whenever a value is removed from cache. This can be
	// used to do manual memory deallocation. Would also be called on eviction
	// and rejection of the value.
	OnExit func(val V)
	// KeyToHash function is used to customize the key hashing algorithm.
	// Each key will be hashed using the provided function. If keyToHash value
	// is not set, the default keyToHash function is used.
	KeyToHash func(key K) (uint64, uint64)
	// Cost evaluates a value and outputs a corresponding cost. This function
	// is ran after Set is called for a new item or an item update with a cost
	// param of 0.
	Cost func(value V) int64
	// IgnoreInternalCost set to true indicates to the cache that the cost of
	// internally storing the value should be ignored. This is useful when the
	// cost passed to set is not using bytes as units. Keep in mind that setting
	// this to true will increase the memory usage.
	IgnoreInternalCost bool
	// TtlTickerDurationInSec set the value of time ticker for cleanup keys on ttl
	TtlTickerDurationInSec int64
}

Config is passed to NewCache for creating new Cache instances.

type Item

type Item[V any] struct {
	Key        uint64
	Conflict   uint64
	Value      V
	Cost       int64
	Expiration time.Time
	// contains filtered or unexported fields
}

Item is passed to setBuf so items can eventually be added to the cache.

type Key

type Key = z.Key

type Metrics

type Metrics struct {
	// contains filtered or unexported fields
}

Metrics is a snapshot of performance statistics for the lifetime of a cache instance.

func (*Metrics) Clear

func (p *Metrics) Clear()

Clear resets all the metrics.

func (*Metrics) CostAdded

func (p *Metrics) CostAdded() uint64

CostAdded is the sum of costs that have been added (successful Set calls).

func (*Metrics) CostEvicted

func (p *Metrics) CostEvicted() uint64

CostEvicted is the sum of all costs that have been evicted.

func (*Metrics) GetsDropped

func (p *Metrics) GetsDropped() uint64

GetsDropped is the number of Get counter increments that are dropped internally.

func (*Metrics) GetsKept

func (p *Metrics) GetsKept() uint64

GetsKept is the number of Get counter increments that are kept.

func (*Metrics) Hits

func (p *Metrics) Hits() uint64

Hits is the number of Get calls where a value was found for the corresponding key.

func (*Metrics) KeysAdded

func (p *Metrics) KeysAdded() uint64

KeysAdded is the total number of Set calls where a new key-value item was added.

func (*Metrics) KeysEvicted

func (p *Metrics) KeysEvicted() uint64

KeysEvicted is the total number of keys evicted.

func (*Metrics) KeysUpdated

func (p *Metrics) KeysUpdated() uint64

KeysUpdated is the total number of Set calls where the value was updated.

func (*Metrics) LifeExpectancySeconds

func (p *Metrics) LifeExpectancySeconds() *z.HistogramData

func (*Metrics) Misses

func (p *Metrics) Misses() uint64

Misses is the number of Get calls where a value was not found for the corresponding key.

func (*Metrics) Ratio

func (p *Metrics) Ratio() float64

Ratio is the number of Hits over all accesses (Hits + Misses). This is the percentage of successful Get calls.

func (*Metrics) SetsDropped

func (p *Metrics) SetsDropped() uint64

SetsDropped is the number of Set calls that don't make it into internal buffers (due to contention or some other reason).

func (*Metrics) SetsRejected

func (p *Metrics) SetsRejected() uint64

SetsRejected is the number of Set calls rejected by the policy (TinyLFU).

func (*Metrics) String

func (p *Metrics) String() string

String returns a string representation of the metrics.

Directories

Path Synopsis
contrib
z

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL