GoBlobStore

module
v0.0.0-...-ca3ac9b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 22, 2023 License: Apache-2.0

README

GoBlobStore

A multi-tenant proxy service for storing binary data in various storage systems with a simple HTTP interface.

features

  • multi tenant binary store with strong separation of data
  • simple docker container
  • on multi container environment automatic discovery and synchronization of tasks
  • proxy for file system and s3
  • simple http interface
  • http path, http header or jwt based tenant discovery
  • configurable jwt role based access control
  • user defined retention per blob
  • index option to search for blob properties

Installation

Docker

simply run

docker run -v <host data path>:/data -p 8443:8443 -p 8000:8000 go-blob-store

to run this service as a single node with a simplefile storage class for storage and a preconfigured fastcache for the cache. Exposes port 8443 for the data interface and 8000 for the metrics.

For other options see the configuration file.

Configuration

beside the simple default configuration there are some options you might want to change in your environment.

Configuration file

The configuration file service.yaml will be loaded from /data/config/service.yaml.

You can simply mount this to another file system and create a new service.yaml with your own configuration. (the defaults as set in the default service.yaml will be used, if the option is not set)

Every entry of the configuration can be set with a ${} macro for using environment variables for the configuration.

Storagesystem

The configuration of the storage system consists of several providers. First the primary provider called storage

The data is stored here first and forms the basis and reference for the entire storage.

Second, the backup: each blob is stored in the backup after successful storage in the primary storage (synchronous or asynchronous). If a failure occurs in the primary storage, then the blob can be restored from the backup, automatically or manually.

a cache can be configured as a third component. This is also operated automatically. An adjustable (number or size) amount of blobs can be kept in the cache for quick access.

The storage providers in turn can be used for all 3 parts. Exceptions are, of course, specialized providers such as FastCache.

Another unit is the retention system. This works independently of the Storgae providers.

The index is the fifth element. This is used to search for specific blobs via properties, be they system or custom properties.

Tenant specific backup storage

for every tenant there is the possibility to allow a specific backup storage system, beside of using the default backup storage. This feature will be activate with the allowtntbackup flag in the configuration. Now a tenant have the possibility to add a own backup storage to the configuration. After creating (or changing) this storage, the blobstore will automatically start a resynchronization task for that tenant. This task will automatically move all backup file to the new storage provider. In the time of this operation further changes on the backup store will be permitted. Allowed provider for the tenant backup storage are: S3 Storage, Blob Storage.

Limitation:
  • Only S3 Storage is permitted

  • No encryption for external S3 storage

  • no auto move on changes of the storage properties

Example of a post to create a new tenant backup storage: POST: https://localhost:8443/api/v1/config Headers: X-tenant:

{
  "storageclass": "S3Storage",
  "properties" : {
   "endpoint": "http://127.0.0.1:9001",
   "bucket": "mcs",
   "accessKey": "xiSwpTnOf6QXxu3Y",
   "secretKey": "sT7lJIgV4tYoOljdpfr9kMoLE0PgMPJ9"
  }
}

Answer:

{
  "type": "createResponse",
  "tenantid": "mcs",
  "backup": "S3Storage"
}

SimpleFile Storage

The simple file storage is a file system based storage.

engine:
 retentionManager: SingleRetention
 tenantautoadd: true
 backupsyncmode: false
 allowtntbackup: false
 storage:
  storageclass: SimpleFile
  properties:
   rootpath: /data/storage

You can use this storage for all kind of storage types, (even backup or cache). The only property needed is the rootpath which will lead to the used file system. On docker you can use any mount point / volume for that. Every tenant will get a subfolder. On this tenant directory there will be a 2 dimensional folder structure for the blob data. For the retention files there will be a dedicated folder.

SimpleFileMultiVolume Storage

The simple file multi volume storage is a file system based storage. It will use multiple volumes, accessed via a single root path, as sub folders. For the Tenantmanager you can configure an extra space. Eg.:

You have different volumes mounted beneath a root path called /data as they will be called mvn01 mvn02 .... This will lead into a file system tree

/data -+- /mvn01
       |- /mvn02
       |- /mvn03
       |- ...

The storage manager will than automatically determine capacity and free space of every volume and will report this into a file .volumeinfo.

engine:
 retentionManager: SingleRetention
 tenantautoadd: true
 backupsyncmode: false
 allowtntbackup: false
 storage:
  storageclass: SFMV
  properties:
   rootpath: /data
   tenantpath: /data/tnt

You can use this storage for main or backup storage. Every tenant will get a subfolder on every volume, if there is data for this tenant on this volume. On this tenant directory there will be a 2 dimensional folder structure for the blob data. For the retention files there will be a dedicated folder on every volume (only for the files on this volume). The tenant manager will have it's own volume. This can be shared with one of the data volumes. On POST/PUT data the storage will randomly select a volume, but volumes with higher utilization have a lower probability of being selected. On GET all volumes will be checked for the present of the file. New volumes will be automatically detected.

There is no sharding in this storage. So a backup should always be configured.

S3 Storage

The S3 storage provider can be used as main storage or backup storage with the same parameters.

Simply change in engine/storage/ the storage class to S3Storage and add the configured properties:

engine:
 retentionManager: SingleRetention
 tenantautoadd: true
 backupsyncmode: false
 allowtntbackup: false
 storage:
  storageclass: S3Storage
  properties:
   endpoint: "https://192.168.178.45:9002"
   bucket: "goblobstore"
   accessKey: D9Q2D6JQGW1MVCC98LQL
   secretKey: LDX7QHY/IsNiA9DbdycGMuOP0M4khr0+06DKrFAr
   password: 4jsfhdjHsd?
   insecure: false

you can use the same for the backup storage:

engine:
 retentionManager: SingleRetention
 tenantautoadd: true
 backupsyncmode: false
 allowtntbackup: false
 storage:
  storageclass: SimpleFile
  properties:
   rootpath: /data/storage
 backup:
  storageclass: S3Storage
  properties:
   endpoint: "http://192.168.178.45:9002"
   bucket: "goblobstore"
   accessKey: D9Q2D6JQGW1MVCC98LQL
   secretKey: LDX7QHY/IsNiA9DbdycGMuOP0M4khr0+06DKrFAr
   password: 4jsfhdjHsd?
   insecure: false

Fastcache

Fastcache is a specialised storage engine only to be used for a cache storage.

You can combine this with any other storage engine, even with a optional backup engine

engine:
 retentionManager: SingleRetention
 tenantautoadd: true
 backupsyncmode: false
 allowtntbackup: false
 storage:
  storageclass: S3Storage
  properties:
   endpoint: "https://192.168.178.45:9002"
   bucket: "goblobstore"
   accessKey: D9Q2D6JQGW1MVCC98LQL
   secretKey: LDX7QHY/IsNiA9DbdycGMuOP0M4khr0+06DKrFAr
   password: 4jsfhdjHsd?
   insecure: false
 cache:
  storageclass: FastCache
  properties:
   rootpath: /data/blobcache
   maxcount: 100000
   maxramusage: 1024000000

Headermapping

There are defined header for operation

tenant: is the name of the tenant in a multi-tenant environment

apikey: the apikey can be used to identify the right usage of this service. (in the configuration you can switch this off)

retention: (optional) if set, this blob will die only available until this time (in minutes) is over, counted from the creation time, or, if a reset retention occur, from the reset time.

filename: is the filename of the file itself, and only needed if you use direct binary upload. Please note that the value must be encoded according to RFC8187 if the file name contains umlauts.

blobid: (optional) is the predefined blob id of the file. This must be tenant unique otherwise you will get an conflict error.

The service automatically save all needed headers to the blob description object. If you set a headerprefix, the service will additionally put all header prefixed with this text to the description, too.

In the config you can map the header to other header names. Example:

headermapping:
 headerprefix: x-mcs
 retention: X-mcs-retention
 tenant: X-mcs-tenant
 filename: X-mcs-filename
 apikey: X-mcs-apikey
 blobid: X-mcs-blobid

JWT Tenant discovery and Authorization

Normally the tenant for the blob storage is discovered by a seperate header. (As you can read in the chapter Header mapping). If you are using JWT for authentication/authorization, the Tenant can be discovered by an extra claim on the jwt. Simply activate jwt authentication with

auth:
 type: jwt
 properties: 
  validate: false
  strict: true
  tenantClaim: Tenant
  roleClaim: 
  rolemapping: 
   object-reader: 
   object-creator:
   object-admin:
   tenant-admin:
   admin:

validate false means, the token is not validated against the issuer. (this is normally ok, when the token is already checked by an api gateway or other serving services) (At the moment this is the only option. JWT Token validation is not implemented.)

strict true means the call will fail, if not all needed parameters, (at the moment only the tenant) can be evaluated from the token. false will fall back to http headers

tenantClaim will name the claim name of the tenant value. Defaults to Tenant (optional)

Authorization Roles

In the blob storage system there are some small roles for the different parts of the blob storage service. Roles can only be used with JWT and role mapping activated. You can deactivate the role checking, if you left the roleClaim property empty.

Role name What the user with this role can do.
object-reader A user with this role can only read the data from his tenant.
And can do a search and list objects.
object-creator A user with this role can create new blobs. And only this.
No view or list permissions are granted
object-admin A user with this role can view, create and delete objects.
And he can set/modify object properties, like metadata and retention.
tenant-admin A user with this role can manage the tenant properties
(at the moment not implemented),
do check and restore for the whole storage
admin A user with this role can manage the service itself, as
adding/deleting new tenants to the service.
With this role only, you can't write, read objects from any tenant.

Example with full role mapping:

auth:
 type: jwt
 properties: 
  validate: true
  strict: true
  tenantClaim: Tenant
  roleClaim: Roles
  rolemapping: 
   object-reader: Reader
   object-creator: Writer
   object-admin: ObAdmin
   tenant-admin: TnAdmin
   admin: Admin

For finding desired blobs you can configure an index engine. Possible options are

(sorry, nothing more at this moment)

Query Language

A separate query language is supported for search independency of the underlying index engine. This offers a simple syntax for searching.

Some value element examples:

foo ~ search the default field for value "foo" in a match or term query

35 ~ search the default field for the number 35, as an integer in a match or term query

name:Joe ~ search the name field for the value "Joe" as a match or term query

count:2 ~ search the count field for the numerical value 2 as a match or term query

msg:"foo bar baz" ~ search the msg field using a match-phrase query

amount:>=40 ~ search the amount field using a range query for documents where the field's value is greater than or equal to 40

created_at:<2017-10-31T00:00:00Z ~ search the created_at field for dates before Halloween of 2017 (all datetimes are in RF3339 format, UTC timezone)

cash:[50~200] ~ returns all docs where cash field's value is within a range greater than or equal to 50, and less than 200.

updated_at:[2017-04-22T09:45:00Z~2017-05-03T10:20:00Z] ~ window ranges can also include RFC3339 UTC datetimes

Any field or parenthesized grouping can be negated with the NOT or ! operator:

NOT foo ~ search for documents where default field doesn't contain the token foo

!c:[2017-10-29T00:00:00Z~2017-10-30T00:00:00Z] ~ returns docs where field c's date value is not within the range of October 29-31, 2017 (UTC)

!count:>100~ search for documents wherecount` field has a value that's not greater than 100

NOT (x OR y) ~ search the default field for documents that don't contain terms "x" or "y"

Parentheses for grouping of subqueries is not supported:

NOT foo:bar AND baz:99 ~ return blobs where field foo's value is not "bar" and where field baz's value is 99.

Operators have aliases: AND -> & and OR -> |:

Internal Fulltextindex

For smaller installations there is a small fulltext implementation based on bluge. (https://blugelabs.com/) This index can only be used in a single instance installation. With multi instances the writing can fail, if the index of a tenant is written from two nodes at a time. For multi instance searching please use the mongo index.

engine:
...
 index:
  storageclass: bluge
  properties:
    rootpath: <path to a folder>

You can set the root path to the same folder as in the SimpleFile storage. The structure will be integrated. For every tenant there will be a subfolder. And in this tenant directory for this index there will be a folder _idx created, which will store all needed files for the index.

Mongo Index

For the mongo index option you have to provide the following information

engine:
...
 index:
  storageclass: MongoDB
  properties:
   hosts: 
	- 127.0.0.1:27017
   username:
   password:
   authdatabase: blobstore
   database: blobstore

username and password can be provided via secret.yaml.

Every tenant will create a collection in the database. For this collection the service will automatically create an index based on the blodID. For direct searching in mongo db simply add an # to the json coded mongo query syntax. Attention: all headers are converted to lower case.

As an example:

#{"$and": [{"x-tenant": "MCS"}, {"x-user": "Willie"} ]}

Tenant Based API Endpoints

The tenant is the main part to split up the data. Every tenant is based on the tenant name or id. This id should be case insensitive and should only consist of chars which are valid for filenames.

The tenant can be given in different ways.

First via subpath: the main routes are: /api/v1/stores/{tntid}/blobs/

/api/v1/stores/{tntid}/search/

where tntid id is the id of the tenant.

Second via jwt: you can have a configurable claim for the tenant, which is used in every tenant based call. So than you can use the following routes. The config and admin routes are intentionally only avaible without tenant subfolder.

/api/v1/blobs/ /api/v1/search/ /api/v1/config/ /api/v1/config/stores/ /api/v1/admin/check /api/v1/admin/restore

The third option is a configurable http header. This order is also the order for evaluating. With one exclusion, if you try to select the tenant via route and jwt tenant evaluation is active, than both tenants will be checked to be equal. Otherwise access is denied.

Directories

Path Synopsis
cmd
service
Package main this is the entry point into the service
Package main this is the entry point into the service
internal
api
auth
Package auth all things related to authentication and authorization
Package auth all things related to authentication and authorization
services/bluge
Package bluge this package contains all things related to the bluge fulltext index engine.
Package bluge this package contains all things related to the bluge fulltext index engine.
services/business
Package business the package contains the structs for the business rules of the storage system
Package business the package contains the structs for the business rules of the storage system
services/fastcache
Package fastcache implementing a fast cache in memory and fast file storage
Package fastcache implementing a fast cache in memory and fast file storage
services/migration
Package migration this package holds a async service for doing some tenant related migration tasks, like backup, restore, check
Package migration this package holds a async service for doing some tenant related migration tasks, like backup, restore, check
services/mongodb
Package mongodb using a mongo db as a index engine for the search
Package mongodb using a mongo db as a index engine for the search
services/s3
Package s3 this package contains all s3 related structs
Package s3 this package contains all s3 related structs
services/simplefile
Package simplefile implement a storage system on a mounted device with simple files as storage objects
Package simplefile implement a storage system on a mounted device with simple files as storage objects
utils/httputils
Package httputils some helper functions on the http protocol
Package httputils some helper functions on the http protocol
utils/readercomp
Package readercomp provides comparison functions for readers or more specific for comparing file contents
Package readercomp provides comparison functions for readers or more specific for comparing file contents
utils/slicesutils
Package slicesutils some convenient functions on slices
Package slicesutils some convenient functions on slices
pkg

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL