durostore

package module
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 15, 2022 License: MIT Imports: 8 Imported by: 0

README

Durostore (github.com/antonio-alexander/go-durostore)

Durostore is a persistent file storage library that can be used to serialize data to disk and access it randomly (without having to load all of it into memory). Although it is optimized for FIFO operations and controlling how much memory durostore allocates.

Durostore does the obvious to allow random reads and sequential writes by writing fixed sized entries to an index file that is loaded into memory in total which contains enough information to be able to randomly read that non-fixed size data from a data file.

Durostore:

  • Is safe for concurrent usage
  • Implements file locking using github.com/gofrs/flock
  • Provides the ability to read randomly using indexes
  • Provides the ability to write one or more items simultaneously

Use cases

Durostore is useful when attempting to apply some variation of the producer consumer design pattern and you want a queue/fifo that can persist between runs or create a queue that can run between memory spaces (using the "disk" as a medium).

Queue

There is a go-queue implementation that wraps duro-store to try to get the best of both worlds, look at the documentation at github.com/antonio-alexander/go-durostore/queue.

Configuration

Durostore can be configured using the New() method or using the Load() method. Keep in mind that the New() function will panic if the configuration is invalid (or generates an error). Below is a brief description of the configuration items:

  • Directory: this is the directory where durostore reads/writes files
  • Max Files: this is the maximum number of files to write to
  • Max File Size: the is the maximum size of each individual file
  • Max Chunk: The maximum amount of data to read into memory
  • File Prefix: the prefix for all the files generated by durostore
  • File Locking: whether or not to use the file locking
  • Launch Indexer: whether or not to launch the indexer go routine
  • Indexer Rate: how often to re-read the indexes
//Configuration describes the different options to configure
// an instance of durostore
type Configuration struct {
    Directory     string        `json:"directory" yaml:"directory"`           //directory to store files in
    MaxFiles      int           `json:"max_files" yaml:"max_files"`           //maximum number of files to generate
    MaxFileSize   int64         `json:"max_file_size" yaml:"max_file_size"`   //max size of each data file
    MaxChunk      int64         `json:"max_chunk" yaml:"max_chunk"`           //max chunk of data to read into memory at once
    FilePrefix    string        `json:"file_prefix" yaml:"file_prefix"`       //the prefix for files
    FileLocking   bool          `json:"file_locking" yaml:"file_locking"`     //whether or not to use file locking
    LaunchIndexer bool          `json:"launch_indexer" yaml:"launch_indexer"` //whether or not to launch the indexer
    IndexerRate   time.Duration `json:"indexer_rate" yaml:"indexer_rate"`     //how often to read indexes
}

This is an example configuration:

{
  "directory": "./temp",
  "max_files": 10,
  "max_file_size": 52428800,
  "max_chunk": 52428800,
  "file_prefix": "duro",
  "file_locking": true
}

As configured, it will create dat and idx files within ./temp, it will create a maximum of 10 files, with each file being 50MB (52428800 bytes) and having a maximum of 50MB in memory at a time. The fiel prefix of duro create the following files (on initial write):

  • duro_lock.lock
  • duro_001.idx
  • duro_001.dat
  • duro_002.idx
  • duro_002.dat

Quickstart

An instance of durostore can be created with the following code snippet:

import (
    "fmt"

    "github.com/antonio-alexander/go-durostore"
)

func main() {
    //this will create a pointer and attempt to load it
    // with the given configuration
    store := durostore.New(durostore.Configuration{
        Directory:   "./temp",
        MaxFiles:    10,
        MaxFileSize: durostore.MegaByteToByte(10),
        MaxChunk:    durostore.MegaByteToByte(50),
        FilePrefix:  filePrefix,
    })
    //alternatively, you could do the following:
    store := durostore.New()
    if err := store.Load(durostore.Configuration{
        Directory:   "./temp",
        MaxFiles:    10,
        MaxFileSize: durostore.MegaByteToByte(10),
        MaxChunk:    durostore.MegaByteToByte(50),
        FilePrefix:  filePrefix,
    }); err != nil {
        fmt.Println(err)
    }
}

Durostore provides the following three API to interact with the store, it provides the ability to sequentially write as well as randomly read and delete data.


//Owner is a set of functions that should only be used
// by the caller of the NewDurostore function
type Owner interface {
    //Close can be used to set the internal pointers to nil and prepare the underlying pointer
    // for garbage cleanup, the API doesn't guarantee that it can be re-used
    Close() (err error)
}

//Manage like Owner is an interface that should be only used by the caller of the NewDurostore
// function. It can be used to start/stop the business logic
type Manage interface {
    //Load can be used to configure and initialize the business logic for a store, it will return a
    // channel for errors since some of the enqueue interfaces can't communicate errors
    Load(config Configuration) (err error)

    //PruneDirectory can be used to completely delete all data on disk while maintaining all in-memory
    // data, this is useful in situations where there's data loss, a single option index can be provided
    // if none is provided -1 is assumed and all files are deleted
    PruneDirectory(indices ...uint64) (err error)
}

//Reader is a subset of the Durostore interface that allows the ability to
// read from the store
type Reader interface {
    //Read will randomly read given data, whether that index is a uint64
    Read(indices ...uint64) (bytes []byte, err error)
}

type Writer interface {
    //Write will sequentially write an item to the store and increment the
    // internal index by one, it accepts a slice of bytes or one or more
    // pointers/structs that implement the encoding.BinaryMarshaler interface
    Write(items ...interface{}) (index uint64, err error)

    //Delete will remove the index (but not delete the actual data) of a given
    // index if it exists
    Delete(indices ...uint64) (err error)
}

type Info interface {
    //Length will return the total number of items available in the store
    Length() (size int)

    //Size will return the total size (in bytes)
    Size() (size int64)
}

Implementation

Although the API is straight forward (in it's use), the API is complicated through use of variatics and empty interfaces. To actually write data, you have to provide either a slice of bytes or a pointer that implements encoding.BinaryMarshaller. The read data function just returns a slice of bytes you could then use to decode into useful data. The index variatics have different abilities depending on the function:

  • Read() can only be used to read a single index, if no index is provided, the internal read index is used which will be the oldest data that hasn't been deleted.
  • Delete can be used to delete one or more indexes, if no index is provided, the internal read index is used which will be the oldest data that hasn't been deleted.

See example code below:


import (
    "encoding/json"

    "github.com/antonio-alexander/go-durostore"
)

type Data struct {
    Int    int    `json:"Int"`
    String string `json:"String"`
}

func (d *Data) MarshalBinary() ([]byte, error) {
    return json.MarshalIndent(d, "", " ")
}

func (d *Data) UnmarshalBinary(bytes []byte) error {
    return json.Unmarshal(bytes, d)
}

func Read(reader durostore.Reader, i ...uint64) *Data {
    data := &Data{}
    bytes, err := reader.Read(index...)
    if err != nil {
        return nil
    }
    err = data.UnmarshalBinary(bytes)
    if err != nil {
        return nil
    }
    return data
}

The Read function is worthy of note since it provides a way to wrap the Read function in the API such that you can read data and decode it into the pointer (using the UnmarshalBinary) function.

Memory management

The initial use case for this package was a low level solution for data persistence that would allow control of memory and cpu usage for garbage collection. It does this through a couple of ideas/rules that are enforced via logic:

  • when data is written, it's not stored in memory, but its committed to disk (there are some better solutions if you want a kind of running cache)
  • when the store is initially loaded, data is read in (sequentially) from the lowest index to the chunk size (or the highest index if no chunk size is configured)
  • data is actively stored as pointers to slices of bytes (to simplify garbage collection and reduce the footprint of the data map)
  • all indexes are read into memory
  • when data is deleted, the index and data are removed from the maps
  • when data is read, a new byte slice is created and data is copied from the internal pointer to that slice to attempt to not mingle any slices and hinder garabge collection
  • garbage collection of the internal maps can be "forced" by using the Load() function (since it'll re-create all the data and maps)
  • when you delete all the data in memory and there's still data left (e.g. there are still indexes), it'll load in data up to the chunk size

Persistence storage management

Writing is optimized, meaning that you can write swaths of data at once (while reading is not). The only ways to "manage" your io operations when it comes to persistence storage are:

  • group your writes and deletes to single operations (each actually attempts to write files)
  • do whatever you need to do with the OS to get it to commit to disk in a resonable amount of time (so you're less likely to lose data)
  • manage how much data you read into memory
  • it's better to have more, smaller files than fewer larger files (due to inodes size etc), this also can simplify recovery
  • deletes are practically soft until all data in the file has been deleted, smaller files should allow you to manage reduce your on-disk foot print more often (this happens automatically)

Asynchronous usage

durostore is safe for concurrent usage within the same memory space, just keep in mind that writes are sequential, so even though there's a "lock" the index order will be determined by who locks first. durostore isn't written so you need to know the indexes, but you could hypothetically create a database of indexes (if you wanted).

File locking IS included within durostore if configured to do so. File locking allows some magical implementations where you could have two applications using the same directory/file-prefix and communicating (e.g. producer consumer). The producer would write only, while the consumer would periodically read the data (there's some magic missing for this use case right now).

Catastrophic failure/recovery

Generally catastrophic failures should only occur if there's no recovery logic within the implementation of durostore: for example, if you attempted to write data, that write failed, and you didn't account for the failure within your business logic. If for some reason, a file becomes corrupted and you can't read that data using durostore, you an recover the data if:

  • your data is homogenous and you have a proper UnmarshalBinary function
  • your data is heterogenous and you implement some kind of wrapper to identify which function to use to UnmarshalBinary the data

Finally, if you just want to just clear the data, post configuration, you can do so with the following code snippet (this is DESTRUCTIVE):

    d := durostore.New(durostore.Configuration{
        Directory:   "./",
        MaxFiles:    2,
        MaxFileSize: 1024,
        MaxChunk:    -1,
        FilePrefix:  filePrefix,
    })
    if err := d.PruneDirectory(); err != nil {
        fmt.Println(err)
    }

The PruneDirectory() function is a variatic where you can provide an index, if none is provided, it will empty the configured directory of anything iwth the configured file index.

Read Wrappers

Reading is a bit of a pain with durostore since it's read in as bytes and you have to otherwise convert it. The example below wraps the read function such that it'll convert it after reading the bare bytes using the UnmarshalBinary function.

type Data struct {
    Int    int    `json:"Int"`
    String string `json:"String"`
}

func (d *Data) UnmarshalBinary(bytes []byte) error {
    return json.Unmarshal(bytes, d)
}

func Read(reader durostore.Reader, index ...uint64) *Data {
    data := &Data{}
    bytes, err := reader.Read(index...)
    if err != nil {
        return nil
    }
    err = data.UnmarshalBinary(bytes)
    if err != nil {
        return nil
    }
    return data
}

Alternatively, you may find that you require something a bit more versatile. In situations where you need to support multiple data types, you'll need to create a wrapper data type that is first unmarshalled to output a type identifier (a string) and a slice of bytes. The type identifier can be used to determine what data type to unmarshal the bytes into.

This is also a great use case for protocol buffers.

Documentation

Index

Constants

View Source
const (
	ErrUnsupportedTypef     string = "unsupported type: %T"
	ErrIndexNotFoundf       string = "index: %d not found"
	ErrNotDirectory         string = "file: %s is not a directory"
	ErrNotConfigured        string = "durostore not configured"
	ErrUnableMakeDir        string = "unable to make directory"
	ErrNoItemsToWrite       string = "no items to write"
	ErrTooManyIndicesToRead string = "more than one index provided for read"
	ErrMaxChunkData         string = "max chunk in-memory data reached"
	ErrMaxStoreSize         string = internal.ErrMaxStoreSize
)

Variables

This section is empty.

Functions

func MegaByteToByte

func MegaByteToByte(n int64) int64

func New

func New(parameters ...interface{}) interface {
	Owner
	Reader
	Writer
	Info
}

New can be used to generate a populated store pointer that implements Owner/Manager/Durostore

Types

type Configuration

type Configuration struct {
	Directory   string `json:"directory" yaml:"directory"`         //directory to store files in
	MaxFiles    int    `json:"max_files" yaml:"max_files"`         //maximum number of files to generate
	MaxFileSize int64  `json:"max_file_size" yaml:"max_file_size"` //max size of each data file
	MaxChunk    int64  `json:"max_chunk" yaml:"max_chunk"`         //max chunk of data to read into memory at once
	FilePrefix  string `json:"file_prefix" yaml:"file_prefix"`     //the prefix for files
	FileLocking bool   `json:"file_locking" yaml:"file_locking"`   //whether or not to use file locking
}

Configuration describes the different options to configure an instance of durostore

type Example added in v1.0.2

type Example struct {
	Int    int     `json:"int,omitempty"`
	Float  float64 `json:"float,omitempty"`
	String string  `json:"string,omitempty"`
}

func ExampleRead added in v1.0.2

func ExampleRead(reader Reader, index ...uint64) *Example

func (*Example) MarshalBinary added in v1.0.2

func (e *Example) MarshalBinary() ([]byte, error)

func (*Example) UnmarshalBinary added in v1.0.2

func (e *Example) UnmarshalBinary(bytes []byte) error

type Info

type Info interface {
	//Length will return the total number of items available in the store
	Length() (size int)

	//Size will return the total size (in bytes)
	Size() (size int64)
}

type Owner

type Owner interface {
	//Close can be used to set the internal pointers to nil and prepare the underlying pointer
	// for garbage cleanup, the API doesn't guarantee that it can be re-used
	Close() (err error)

	//Load can be used to configure and initialize the business logic for a store, it will return a
	// channel for errors since some of the enqueue interfaces can't communicate errors
	Load(config Configuration) (err error)

	//PruneDirectory can be used to completely delete all data on disk while maintaining all in-memory
	// data, this is useful in situations where there's data loss, a single option index can be provided
	// if none is provided -1 is assumed and all files are deleted
	PruneDirectory(indices ...uint64) (err error)

	//UpdateIndexes will attempt to read all available indexes and update the in-memory indexes
	UpdateIndexes(indexSize ...int64) (size int64, err error)
}

Owner is a set of functions that should only be used by the caller of the NewDurostore function

type Reader

type Reader interface {
	//Read will randomly read given data, whether that index is a uint64
	Read(indices ...uint64) (bytes []byte, err error)
}

Reader is a subset of the Durostore interface that allows the ability to read from the store

type Writer

type Writer interface {
	//Write will sequentially write an item to the store and increment the
	// internal index by one, it accepts a slice of bytes or one or more
	// pointers/structs that implement the encoding.BinaryMarshaler interface
	Write(items ...interface{}) (index uint64, err error)

	//Delete will remove the index (but not delete the actual data) of a given
	// index if it exists
	Delete(indices ...uint64) (err error)
}

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL