capillaries

module
v1.1.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 9, 2023 License: MIT

README

logo

Capillaries

Capillaries is a supervised data processing framework. It fills the gap between distributed, scalable data processing/integration solutions and the need to produce enriched, customer-ready, production-quality, human-curated data within SLA time limits.

TL;DR

What Capillaries is and what it is not, with a use case discussion and diagrams

Getting started guide on how to run a quick Docker-based demo without compiling code

Why Capillaries?

BEFORE AFTER
Data aggregation SQL joins Capillaries lookups in Cassandra + Go expressions (scalability, parallel execution)
Data filtering SQL queries, custom code Go expressions (scalability, maintainability)
Data transform SQL expressions, custom code Go expressions, Python formulas (parallel execution, maintainability)
Intermediate data storage Files, relational databases on-the-fly-created Cassandra keyspaces and tables (scalability, maintainability)
Workflow execution Shell scripts, custom code, workflow frameworks RabbitMQ as the Single Point of Failure + workflow status stored in Cassandra (parallel execution, fault tolerance, incremental computing)
Workflow monitoring and interaction Custom solutions Capillaries UI, Toolbelt utility, API, Web API (transparency, operator validation support)
Workflow management Shell scripts, custom code Capillaries script file with DAG

Highlights

Incremental computing

Allows splitting the whole data processing pipeline into separate runs that can be started independently and re-run if needed.

Parallel processing

Splits large data volumes into smaller batches processed in parallel. Executes multiple data processing tasks (DAG nodes) in parallel.

Operator interaction

Allows human data validation for selected data processing stages.

Fault tolerance

Survives most of the temporary underlying database connectivity issues and processing node software and hardware failures.

Works with structured data artifacts

Consumes and produces delimited text files, uses database tables internally. Provides ETL/ELT capabilities. Implements a subset of the relational algebra.

Use scenarios

Capable of processing large amounts of data within SLA time limits, efficiently utilizing powerful computational (hardware, VM, containers) and storage (Cassandra) resources, with or without human monitoring/validation/intervention.

Capillaries in depth

What it is and what it is not
Getting started
Testing
Toolbelt, Daemon, and Webapi configuration
Script configuration
Capillaries UI
Capillaries API
Capillaries deploy tool: Openstack cloud deployment
Glossary
Q & A
MIT License

(C) 2023 kleines.hertz[at]protonmail.com

Directories

Path Synopsis
pkg
api
cql
ctx
env
l
sc
wf
test

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL