capillaries

module

v1.1.6 Latest Latest Go to latest Published: Apr 9, 2023 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/capillariesio/capillaries

Links

Open Source Insights

README ¶

Capillaries

Capillaries is a supervised data processing framework. It fills the gap between distributed, scalable data processing/integration solutions and the need to produce enriched, customer-ready, production-quality, human-curated data within SLA time limits.

TL;DR

What Capillaries is and what it is not, with a use case discussion and diagrams

Getting started guide on how to run a quick Docker-based demo without compiling code

Why Capillaries?

	BEFORE	AFTER
Data aggregation	SQL joins	Capillaries lookups in Cassandra + Go expressions (scalability, parallel execution)
Data filtering	SQL queries, custom code	Go expressions (scalability, maintainability)
Data transform	SQL expressions, custom code	Go expressions, Python formulas (parallel execution, maintainability)
Intermediate data storage	Files, relational databases	on-the-fly-created Cassandra keyspaces and tables (scalability, maintainability)
Workflow execution	Shell scripts, custom code, workflow frameworks	RabbitMQ as the Single Point of Failure + workflow status stored in Cassandra (parallel execution, fault tolerance, incremental computing)
Workflow monitoring and interaction	Custom solutions	Capillaries UI, Toolbelt utility, API, Web API (transparency, operator validation support)
Workflow management	Shell scripts, custom code	Capillaries script file with DAG

Highlights

Incremental computing

Allows splitting the whole data processing pipeline into separate runs that can be started independently and re-run if needed.

Parallel processing

Splits large data volumes into smaller batches processed in parallel. Executes multiple data processing tasks (DAG nodes) in parallel.

Operator interaction

Allows human data validation for selected data processing stages.

Fault tolerance

Survives most of the temporary underlying database connectivity issues and processing node software and hardware failures.

Works with structured data artifacts

Consumes and produces delimited text files, uses database tables internally. Provides ETL/ELT capabilities. Implements a subset of the relational algebra.

Use scenarios

Capable of processing large amounts of data within SLA time limits, efficiently utilizing powerful computational (hardware, VM, containers) and storage (Cassandra) resources, with or without human monitoring/validation/intervention.

Directories ¶

Path	Synopsis
pkg
api
cql
ctx
custom
deploy
env
eval
exe/daemon
exe/deploy
exe/toolbelt
exe/webapi
l
proc
sc
wf
wfdb
wfmodel
xfer
test
code/lookup
code/py_calc

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

capillaries

README ¶

Capillaries

TL;DR

Why Capillaries?

Highlights

Incremental computing

Parallel processing

Operator interaction

Fault tolerance

Works with structured data artifacts

Use scenarios

Capillaries in depth

What it is and what it is not

Getting started

Testing

Toolbelt, Daemon, and Webapi configuration

Script configuration

Capillaries UI

Capillaries API

Capillaries deploy tool: Openstack cloud deployment

Glossary

Q & A

MIT License

Directories ¶