= (Tromos) Transparent Online Management Of Storage
Storage is an unbalanced world full of trade-offs and compromises. If you want scalability, you have to sacrifice consistency. If you want performance, you have to sacrifice real-time monitoring.
If you want high ingestion rate, you have to sacrifice either read performance or raw capacity. There is no "one-solution-fits-it-all" since every application has its own requirements and characteristics.
Unfortunately, the faster the application birth rate increases, the faster these applications -and their requirements- diverge.
In an ideal world, every application would run atop a storage system tailored to the storage requirements of the application. We have already seen the merits of this strategy with Docker Containers that provide a customized
execution environment tailored to the runtime requirements of the application. However, a storage system is a quite complex piece of software that usually takes years of development and hardening. As a result, the rate at which new systems emerge cannot cope with the birth rate of application.
link:http://www.tromos.io/download[**Tromos Community Edition**] (or *Tromos-CE*) for solving precisely this problem! Its goal is to be the *easiest and fastest way to design and deploy customized storage containers.*
It does so by breaking the primitives of distributed storage systems into narrow-scoped elements, which the application architects can later compose in arbitrary ways. Using the provided domains-specific language, the architects can configure a data-management environment tailored to the requirements of the application hand, simply by choosing the appropriate combination of components (e.g., distribution logic, in-transit processing, consistency-level, data-layout)
If the application does not require strong semantics, do not apply such a policy. If the application requires strong semantics, add them as a plugin, and so do for any other storage aspect.
*The general principle behind storage containers is: if specific functionality is needed, add it as a plugin. If it is not needed, skip it to avoid unnecessary overheads.*
*Storage Containers* own the managed files but do not own the underlying raw storage (e.g., filesystem, key-value databases, or cloud storage solutions), which they merely use for data persistence.
In other words, it is middleware that separate changes made to application codes by science users from changes made to I/O actions by developers or administrators.
[caption="",link=https://gitlab.com/tromos/tromos-ce/raw/master/docs/images/storagecontainers.png]
image::docs/images/storagecontainers.png[800,800]
== Microservices
Tromos framework consists of Microservices with well-defined APIs whose backend is swappable by plugins. Next, we present the basic architecture of each Microservice.
For the plugins that each Microservice can consume, please consult the link:https://gitlab.com/tromos/tromos-ce/tree/master/hub[Tromos HUB]
==== Devices
Devices provide a convenient and flexible way to abstract the various backends. Simple connectors do not suffice as they suffer from the link:https://thenewstack.io/avoiding-least-common-denominator-approach-hybrid-clouds/[least-common denominator] problem.
Instead, Tromos provides a framework for mapping several processing layers into virtual Devices. (If you are familiar with link:https://en.wikipedia.org/wiki/Device_mapper[Device Mapper],
this part of Tromos can be regarded as the user-space equivalent of Device Mapper).
[caption="",link=https://gitlab.com/tromos/tromos-ce/blob/master/docs/images/DeviceService.png]
image::./docs/images/DeviceService.png[300,300]
<<<
==== Coordinators
The Coordinators are quite similar to Devices, but they are responsible for the control of the metadata. As you we discuss in the next section, we have greater flexibility when separating the data from the metadata, since we can scale and provide Quality of Service independently from one to another.
[caption="",link=https://gitlab.com/tromos/tromos-ce/blob/master/docs/images/CoordinatorService.png]
image::./docs/images/CoordinatorService.png[300,300]
<<<
==== Processors
In its current form, a storage system is much more than a repository of data. It is also a piece of software that process data on behalf of the application. That processing may involve a stream that ends to a single Device (e.g., encryption, compression, deduplication, filtering),
or a stream that ends up to several Devices (e.g., mirroring, stripping, erasure coding). The unique feature that Processors brings into the table is their ability to abstract any desired datapath into a directed acyclic graph (DAG).
Through that, advertised features (e.g., Replication, Stripping, Erasure-coding) are nothing more than mere components of the graph of a filestore.
Yet another advantage of Processors is their ability to link to each and form complex distributed processing networks. That comes especially handy for HPC scenarios where the cycles of compute nodes are very precious to waste for
simple tasks. For example, data filtering, indexing, compression, or any other similar task can be offloaded to a chained Processor running on a filtering node.
[caption="",link=https://gitlab.com/tromos/tromos-ce/blob/master/docs/images/ProcessorService.png]
image::./docs/images/ProcessorService.png[300,300]
==== Client Middleware
The middleware is a lightweight library that enables the client to access the various Microservices and creates Meshes of them. For example, the client can partition the keyspace across
several Coordinators and create a composite Namespace, or distribute the data across Devices. Similarly, it can decide which Processor to use in to perform in-transit processing of the data
before they reach the Devices. Although it provides a set of user-friendly API, it is also equipped with gateways so that clients can benefit from Tromos without having to modify the source
of their applications. For example, when using the Fuse gateway, the application architects can mount the Storage container like a normal filesystem, while still controlling the properties of the virtual storage through the container's Manifest.
[caption="",link=https://gitlab.com/tromos/tromos-ce/blob/master/docs/images/Middleware.png]
image::docs/images/middleware.png[400,400]
link:http://www.tromos.io/docs/overview/introduction/[Learn more]
<<<
== Using Tromos
Before starting though, it is advisable to visit the tromos link:https://gitlab.com/tromos/tromos-ce/blob/master/docs/tutorial/README.adoc[tutorial] and learn how to design your own manifest.
After following these steps, you will be able to create and mount your virtual storage system by using the `tromos-cli` command.
$ go get -v gitlab.com/tromos/tromos-ce/cmd/tromos-cli
$ tromos-cli gateway fuse --mountpoint $MOUNTPOINT --manifest $MANIFEST
$MOUNTPOINT is the location where the storage container will be mounted (e.g., /tmp/test), and $MANIFEST is the specification of the virtual system.
.tutorial.yml
[source, yaml]
----
Name: Tutorial
Description: This file describes a storage container
# Middleware sections defines the plugins that will be used
# on the client-slide middleware
Middleware:
DeviceManager:
plugin: gitlab.com/tromos/hub/selector/random
Namespace:
plugin: gitlab.com/tromos/hub/selector/consistenthash
Devices:
"dev0":
Persistent:
plugin: gitlab.com/tromos/hub/device/filesystem
family: os
path: gitlab.com/tromos/scratch/hdd0
Translators:
"0":
plugin: gitlab.com/tromos/hub/device/blob
blocksize: 2M
"dev1":
Persistent:
plugin: gitlab.com/tromos/hub/device/googledrive
credentials: /home/myuser/credentials.json
Translators:
"0":
plugin: gitlab.com/tromos/hub/device/throttler
rate: 500MB
capacity: 1B
regulate: channel
"1":
plugin: gitlab.com/tromos/hub/device/blob
blocksize: 2M
Coordinators:
"coord0":
Persistent:
plugin: gitlab.com/tromos/hub/coordinator/boltdb
path: /tmp/databases/coord0
Translators:
"0":
plugin: gitlab.com/tromos/hub/coordinator/sequencer
blockw2r: true
blockw2w: true
----
== Contributing
If you are as excited as we are about the evolution of storage systems and distributed processing, feel free to join our community!
Contributions are greatly appreciated! Whether that be feedback, code contributions, or even discussion!
==== Feedback
We are always happy to receive feedback!
* Do any of the commands have surprising effects, output, or results?
* Do you have workflows that the tool supports well, or doesn't support at all?
* Do you have suggestions centered on the user experience (UX) of the tool?
Let us know by filing an issue, describing what you did or wanted to do, what you expected to happen, and what actually happened.
==== Code
The maintainers actively manage the issues list and try to highlight issues suitable for newcomers.
If you want to contribute,
fork the project
do your hack
create a pull request!
Before starting any work, please either comment on an existing issue or file a new one.
==== Contact Information
You can contact the author of Tromos by nikolaidis.fotis@gmail.com
== Licensing
Tromos is licensed under the Apache License, Version 2.0. See
link:https://gitlab.com/tromos/tromos-ce/blob/master/LICENSE[LICENSE] for the full license text.
== FAQ
Tromos is a new project, so things are fragile. Here we will be listing all the known issues that may cause inconveniences
==== What is the error: no matching versions for query "latest" ?
* If you experience an error like link:https://github.com/golang/go/issues/27215[go get ... no matching versions for query "latest"] try to upgrade your Golang version
==== I 'm experiencing data corruption when I have concurrent access to more than 30 files
* Tromos is trying to minimize the number of necessary resources and therefore does extensive use of pools. Given that, the number 30 is associated with the number of instances
that are waiting in the pool. In case that you want to serve more than 30 files, please change the "MaxConcurrentChannels" variable defined in configuration/default. You must
also take into account that a file may consist of a writer and a reader - so you must provision 2 channels per file so to be on the safe side
==== Runtime error: Elements in the pool have been exhausted
* Pools are periodically exhausted. That is a normal case which Tromos can handle transparently to the user. The specific error occurs when the minimum amount of resources are not sufficient to guarantee the minima for an operation. For example, when mirroring your data into 4 locations, you must have at least 4 devices. Otherwise, you get the above error