datastore

package module
v0.0.0-...-2be6e61 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 3, 2019 License: MIT Imports: 8 Imported by: 0

README

Datastore

Datastore is a simple embedded database for Go programs. It is suitable for cases where you don't want to use a complex datastore like Postgres, don't have the performance needs of LSM or LMDB, and prefer a serialization format that is more Go-native than JSON.

Datastore is designed to be simple. It allows you to use Go's native types by implementing a simple interface (no code generation needed). It reads and writes arbitrary data to a single file. It is safe for concurrent use and includes a handful of basic safety features to help prevent footgun situations.

Motivation

I wrote datastore to provide disk-based persistence for a feed scraping tool to keep track of subscriptions, feed updates, discovered items, and to track which items had already been retrieved so they weren't fetched multiple times.

Datastore tracks metadata while the feed content itself is written to disk in raw format. I did not want to setup Postgres just to run a simple CLI, and, while I have used BoltDB in the past, I felt it was a bit cumbersome for this project which has complex types and relations but did not need super duper performance in the database layer.

Considerations

It may be useful to think of datastore not as a database, but as an encoder. In fact, Gob is doing most of the heavy lifting for us. As a direct consequence, your entire database must fit into memory, and must be flushed to disk in order to save it. There are no transactions -- just a "snapshot" when you call Flush.

The abstraction datastore provides makes it much easier to write your program because all of the type reflection, encoding, and file I/O are abstracted away behind a simple interface, freeing you to focus on business logic instead.

If your goal is to build a high performance networked server, or your data has rapid, unbounded growth, or you need ACID features like transactions, datastore is not the best choice. However, for less demanding projects like CLIs or networked services with smaller datasets, datastore will work just fine.

Examples

Open a Database
ds, err := datastore.CreateOrOpen("filename"+datastore.Extension, "myappv1")
...

The second parameter to CreateOrOpen is a sanity check that you're opening the right datastore. Think of it as like an API version.

Implement the Document Interface
type MyType struct {
	ID uint64
	...
}

func (m *MyType) ID() uint64 {
	return m.ID
}

func (m *MyType) SetID(id uint64) {
	m.ID = id
}
Insert a Document
myType := &MyType{}
collection := ds.InType(myType)
collection.Upsert(myType)

Documents are stored in type-specific collections.

Retrieve a Document
document := collection.FindOne(func(doc datastore.Document) bool {
	if myType, ok := doc.(*MyType); ok {
		return myType.Name == "this is the one!"
	}
	return false
}
myType, ok := document.(*MyType)
...
Flush Database to Disk

Warning: Changes are only saved when you flush them to disk. Don't forget to do this!

ds.Flush()
In-memory Datastore / Testing Collections

For testing or ephemeral in-memory usage you can create a Datastore without the Open or Create calls:

ds := &datastore.Datastore{}
collection := ds.In("temp")

Flush will error so there is no way to persist this Datastore. However, this allows you to get a Collection that works for testing, and aside from Flush the rest of the API works the same way as a persistent Datastore.

Developing

datastore is a library. Tests are written in the datastore_test package (not datastore) to ensure that the third party API works as expected and does not rely on private member access.

Make sure to always run tests with the -race flag to ensure safe concurrent behavior.

There is a fixture file in testdata that is generated using go generate. Certain changes to the tests may require you to regenerate this file. After regenerating it you should check it in.

Documentation

Overview

Package datastore provides a simple, embedded database for Go applications that want to store complex types on disk using a binary encoding.

datastore is suitable for cases where you don't want to use a complex database like Postgres, don't have the performance needs of LSM or LMDB, and want a serialization format that is more Go-native than JSON.

Data stored in a Datastore must implement the Document interface and are stored in Collections. Each Collection may include one type of Document. Each Document is indexed by a uint64 that behaves as an autoincrement primary key field. Datastore supports basic CRUD and Find operations on each Collection.

Creating a collection:

ds, err := datastore.OpenOrCreate("mypets.datastore")
pets := ds.In("pets")

Adding a new document:

pet := &Pet{}
pets.Upsert(pet)

When retrieving a document from a collection you must use a Go type assertion:

pets.FindOne(func(doc datastore.Document) bool {
	if pet, ok := doc.(*Pet); ok {
		return pet.Name == "Chomper"
	}
	return false
}

Under the hood datastore encodes structs into a gzipped Gob stream and writes them to a file when you call Flush. Aside from Open and Flush, all other operations are performed in memory, so your dataset (plus some overhead) must not exceed available memory. In addition to decoding stored data, transitory data structures are re-created during the Open call.

As mentioned, collections are created based on the type of the data stored in them. There are currently no special facilities provided for migrating data, so take care when renaming types. Gob is designed to handle addition and deletion of fields but renaming a type will likely cause data to be ignored the next time the datastore is opened. You may safely Open a datastore that contains incompatible types but calling Flush will destroy any incompatible type data.

datastore is designed to be safe for concurrent use by a single process (with multiple goroutines). Datastore uses an atomic write during Flush, but otherwise does not attempt to be crash-safe. datastore is designed to be small, simple, and safe but is not designed for high performance -- for high performance or high capacity embedded data stores, see any number of embeddable LSM or LMDB derivatives.

Index

Constants

View Source
const Extension = ".datastore"

Extension is used as a common file extension to facilitate identifying a particular file as a datastore (like .sqlite). This is a recommendation and is not required. Note that datastore files will always have a gzip comment header "datastore" that may be used to identify them.

Variables

View Source
var ErrInvalidSignature = errors.New("datastore signature does not match")
View Source
var ErrInvalidType = errors.New("type does not match collection")

Functions

func ReadSignature

func ReadSignature(path string) (signature string, err error)

ReadSignature is used to read the signature from a Datastore on disk without decoding the entire file. The file will be opened in non-exclusive read mode. Note that if the file is already locked this function may return an error.

func Signature

func Signature(signature string) string

Signature (the function) prepends "datastore:" to whatever value you supply. It is called internally by Open and Create so you generally won't need to call this function yourself.

Signature (the value) is used as a schema identifier to prevent programs from reading or writing incompatible encodings to a Datastore file on disk.

Each program should use a unique identifier for its Datastore signature. The recommended format is program_name.schema_version, but it may be any string of ISO-8859-1 characters. Including the schema version will allow your program to change the format of the datastore over time without breaking compatibility with older versions of the program or silently corrupting your data if the wrong program version is used.

datastore's Open call enforces an exact match and datastore does not do any fuzzy version matching. Remember that this is a version for your *data*, not your program. See the Gob package for details on how the types themselves are allowed to change over time without requiring a change in signature.

Signature is stored in the gzip header for the Datastore so you can scan for and read signatures without having to decode the entire structure. You can use ReadSignature to inspect this header without attempting to read the entire document into memory.

Types

type Collection

type Collection struct {
	// Type indicates the reflected type of the Documents stored in this
	// collection. DO NOT MODIFY.
	Type string

	// Items holds the map of Documents in this Collection. DO NOT MODIFY.
	Items map[uint64]Document

	// CurrentIndex holds the autoincrement value for this Collection. DO NOT MODIFY.
	CurrentIndex uint64
	// contains filtered or unexported fields
}

Collection is a RWMutex-managed map containing a type that embeds Document. The fields Type, Items, and CurrentIndex are public for serialization purposes but you should NEVER modify these directly. Use the methods instead.

func (*Collection) Delete

func (c *Collection) Delete(document Document)

Delete removes the Document from the Collection and sets the ID to zero.

func (*Collection) DeleteKey

func (c *Collection) DeleteKey(key uint64)

DeleteKey removes the indicated key from the Collection, or no-ops if the key is not present.

func (*Collection) FindAll

func (c *Collection) FindAll(finder func(Document) bool) []Document

Filter is a lookup-style function that returns a list of Documents that satisfy the filter. The Collection is scanned in ascending order. All Documents satisfying the callback will be returned. If none are found the list will be empty. This function always enumerates the entire Collection (i.e. table scan).

func (*Collection) FindKey

func (c *Collection) FindKey(key uint64) Document

Find a Document by key. This is useful for "foreign key" type relationships where you do not want to create a cyclical reference between objects, or a fast lookup by ID if you already know it. Returns nil if the key is not found in the Collection.

func (*Collection) FindOne

func (c *Collection) FindOne(finder func(Document) bool) Document

FindOne is a lookup-style function that returns the first Document that satisfies the callback. The Collection is scanned in ascending order. FindOne enumerates the entire Collection (i.e. table scan) until a match is found, or returns nil if there is no match.

func (*Collection) List

func (c *Collection) List() []uint64

List returns a sorted list of keys (in ascending order) for all Documents currently held in the Collection.

func (*Collection) SetType

func (c *Collection) SetType(document Document) error

func (*Collection) Upsert

func (c *Collection) Upsert(document Document) error

Upsert inserts or updates a Document in the collection.

type Datastore

type Datastore struct {

	// Collections is public because Gob needs to read it. You should not modify
	// this map directly. Use In(), InType(), and the Collection API instead.
	Collections map[string]*Collection
	// contains filtered or unexported fields
}

Datastore contains Collections of Documents and coordinates reading / writing them to a file.

func Create

func Create(path, signature string) (*Datastore, error)

Create creates a new datastore and flushes it to disk. For details on the signature parameter, see the docs for Signature. Create will immediately call Flush to create the datastore file and detect any I/O problems.

func New

func New() *Datastore

New creates a new in-memory Datastore. Flush will never succeed with this type of Datastore. For a persistent Datastore, start with Open or Create.

func Open

func Open(path, signature string) (ds *Datastore, err error)

Open opens a Datastore for reading and writing. The given signature must match the signature stored in the specified Datastore.

Important note: Before calling Open you must call gob.Register for each type you expect to read from the Datastore or Gob will not be able to decode them. Typically you should perform the gob.Register call in an init() func in the same source file where you define the ID() and SetID() methods for your types.

If Open fails with ErrInvalidSignature you can call ds.Signature() on the result to see what Signature was found on disk.

func OpenOrCreate

func OpenOrCreate(path, signature string) (store *Datastore, err error)

OpenOrCreate is a convenience function that can be called to read or initialize a datastore in a single call. We first call Open, and if the Open call fails because of os.IsNotExist we will attempt to Create it.

func (*Datastore) Flush

func (d *Datastore) Flush() error

Flush writes changes to disk, or no-ops if it has already flushed all changes. This uses atomic replace and is not compatible with Windows.

func (*Datastore) In

func (d *Datastore) In(name string) *Collection

In provides a pseudo-fluent interface to select a specific Collection from this Datastore by name. Each collection may only hold one type of Document.

If the Collection does not already exist it will be initialized. Newly- initialized Collections do not have a type until a Document is added.

Note: We recommend using constants for your Collection names so a typo doesn't cause your data to go into the wrong collection.

func (*Datastore) Init

func (d *Datastore) Init(name string, document Document) (*Collection, error)

Init wraps In to initialize a Collection with type information. The document is not stored so it may be a zero type or an initialized document. Init returns an error if the collection already exists and has a different type.

func (*Datastore) Path

func (d *Datastore) Path() string

Path returns the filesystem path where the datastore is located. This path is the same as the one passed in during Open or Create and is NOT further processed by path.Abs or any similar functions.

func (*Datastore) Signature

func (d *Datastore) Signature() string

Signature returns the signature for this datastore. See the Signature package-level function for details.

type Document

type Document interface {
	// ID is the "primary key" used to load and save a Document. It must be
	// unique in each collection, and the ID field must be public so it can be
	// encoded / decoded by the Gob package.
	ID() uint64

	// SetID is called by Datastore internally, and while it is required to
	// satisfy the interface you should never need to call this yourself.
	SetID(id uint64)
}

Document is the interface that allows a struct to be stored in a Collection.

type UIntSlice

type UIntSlice []uint64

UintSlice implements the Sort interface for a slice of uint64. Rather than declare your own variables using this type you only need to wrap the []uint64 during the sort call.

func (UIntSlice) Len

func (u UIntSlice) Len() int

func (UIntSlice) Less

func (u UIntSlice) Less(i, j int) bool

func (UIntSlice) Swap

func (u UIntSlice) Swap(i, j int)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL