Documentation ¶
Overview ¶
Package hybrid provides a hybrid FSDB implementation.
A hybrid FSDB is backed by a local FSDB and a remote bucket. All data are written locally first, then a background thread will upload them to the remote bucket and delete the local data. Read operations will check local FSDB first, and fetch from bucket if it does not present locally. When remote read happens, the data will be saved locally until the next upload loop.
Data stored on the remote bucket will be gzipped using best compression level.
Concurrency ¶
If you turn off the optional row lock (default is on), there are two possible cases we might lose date due to race conditions, but they are very unlikely.
The first case is remote read. The read process is:
- Check local FSDB.
- Read fully from remote bucket.
- Check local FSDB again to prevent using stale remote data to overwrite local data.
- If there's still no local data in Step 3, write remote data locally.
- Return local data.
If another write happens between Step 3 and 4, then it might be overwritten by stale remote data.
The other case is during upload. The upload process for each key is:
- Read local data, calculate crc32c.
- Gzip local data, upload to remote bucket.
- Calculate local data crc32c again.
- If the crc32c from Step 1 and Step 3 matches, delete local data.
If another write happens between Step 3 and 4, then it might be deleted on Step 4 so we only have stale data in the system.
Turning on the optional row lock will make sure the discussed data loss scenarios won't happen, but it also degrade the performance slightly. The lock is only used partially inside the operations (whole local write operation, remote read from Step 3, upload from Step 3).
There are no other locks used in the code, except a few atomic numbers in upload loop for logging purpose.
Example ¶
package main import ( "context" "io/ioutil" "os" "strings" "github.com/fishy/fsdb" "github.com/fishy/fsdb/bucket" "github.com/fishy/fsdb/hybrid" "github.com/fishy/fsdb/local" ) func main() { root, _ := ioutil.TempDir("", "fsdb_") defer os.RemoveAll(root) var bucket bucket.Bucket // TODO: open bucket from an implementation ctx, cancel := context.WithCancel(context.Background()) db := hybrid.Open( ctx, local.Open(local.NewDefaultOptions(root)), bucket, hybrid.NewDefaultOptions(), ) defer cancel() // Stop the upload loop, not really necessary key := fsdb.Key("key") if err := db.Write(ctx, key, strings.NewReader("Hello, world!")); err != nil { // TODO: handle error } reader, err := db.Read(ctx, key) if err != nil { // TODO: handle error } defer reader.Close() // TODO: read from reader if err := db.Delete(ctx, key); err != nil { // TODO: handle error } }
Output:
Index ¶
Examples ¶
Constants ¶
const ( DefaultUploadDelay time.Duration = time.Minute * 5 DefaultUploadThreadNum = 5 DefaultUseLock = true )
Default options values.
Variables ¶
var DefaultSkipFunc = UploadAll
DefaultSkipFunc is the default skip function used.
Functions ¶
func DefaultNameFunc ¶
DefaultNameFunc is the default name function used.
The format is:
fsdb/data/<sha-512/224 of key>.gz
func Open ¶
Open creates a hybrid FSDB, which is backed by a local FSDB and a remote bucket.
There's no need to close, but you could cancel the context to stop the upload loop.
Read reads from local first, then read from remote bucket if it does not exist locally. In that case, the data will be saved locally for cache until the next upload loop.
Write writes locally. There is a background scan loop to upload everything from local to remote, then deletes the local copy after the upload succeed.
Delete deletes from both local and remote, and returns combined errors, if any.
github.com/fishy/gcsbucket and github.com/fishy/s3bucket provide bucket.Bucket implementations for Google Cloud Storage and AWS S3, respectively. And github.com/fishy/blobbucket provides a bucket.Bucket implementation based on Go-Cloud Blob interface.
Types ¶
type Options ¶
type Options interface { // GetUploadDelay returns the delay between two upload scan loops. GetUploadDelay() time.Duration // GetUploadThreadNum returns the number of threads used in upload scan loops. // // The higher the number, the faster the uploads, // but it also means heavier disk I/O load. GetUploadThreadNum() int // GetUseLock returns whether we should use a row lock. // // Uses a row lock guarantees that we do not overwrite newer data with stale // data, but it also degrades all operations. // // Refer to the package documentation for more details. GetUseLock() bool // GetLogger returns the logger to be used in hybrid FSDB. // // If it returns nil, nothing will be logged. GetLogger() *log.Logger // GetRemoteName returns the name for the data file on remote bucket. GetRemoteName(key fsdb.Key) string // SkipKey returns true if the key should not be uploaded to remote bucket // (retain locally), or false if the key should be uploaded to remote bucket. SkipKey(key fsdb.Key) bool // It's possible that this function need to read from the hybrid FSDB, // so it's allowed to be changed in read-only Options. SetSkipFunc(f func(fsdb.Key) bool) }
Options defines a read-only view of options used in hybrid FSDB.
type OptionsBuilder ¶
type OptionsBuilder interface { Options // Build builds the read-only view of the options. Build() Options // SetUploadDelay sets the delay between two upload scan loops. SetUploadDelay(delay time.Duration) OptionsBuilder // SetUploadThreadNum sets the number of threads used in upload scan loops. SetUploadThreadNum(threads int) OptionsBuilder // SetUseLock sets whether to use a row lock. SetUseLock(lock bool) OptionsBuilder // SetLogger sets the logger used in hybrid FSDB. SetLogger(logger *log.Logger) OptionsBuilder // SetRemoteNameFunc sets the function for GetRemoteName. SetRemoteNameFunc(f func(fsdb.Key) string) OptionsBuilder }
OptionsBuilder defines a read write view of options used in hybrid FSDB.
func NewDefaultOptions ¶
func NewDefaultOptions() OptionsBuilder
NewDefaultOptions creates the default options.