capillaries

module
v1.1.11 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 23, 2023 License: MIT

README

logo

Capillaries

Capillaries is a data processing framework that:

  • takes care of the scalability issues and intermediate data store, allowing users to focus on data transforms and data quality control;
  • fills the gap between distributed, scalable data processing/integration solutions and the need to produce enriched, customer-ready, production-quality, human-curated data within SLA time limits.

Why Capillaries?

BEFORE AFTER
Data aggregation SQL joins Capillaries lookups in Cassandra + Go expressions (scalability, parallel execution)
Data filtering SQL queries, custom code Go expressions (scalability, maintainability)
Data transform SQL expressions, custom code Go expressions, Python formulas (parallel execution, maintainability)
Intermediate data storage Files, relational databases on-the-fly-created Cassandra keyspaces and tables (scalability, maintainability)
Workflow execution Shell scripts, custom code, workflow frameworks RabbitMQ as the Single Point of Failure + workflow status stored in Cassandra (parallel execution, fault tolerance, incremental computing)
Workflow monitoring and interaction Custom solutions Capillaries UI, Toolbelt utility, API, Web API (transparency, operator validation support)
Workflow management Shell scripts, custom code Capillaries script file with DAG

Getting started

On Mac, WSL or Linux, run:

git clone https://github.com/capillariesio/capillaries.git
cd capillaries
./copy_demo_data.sh
docker-compose -p "test_capillaries_containers" up

Wait until all containers are started and Cassandra is fully initialized (it will log something like Created default superuser role 'cassandra'). Now Capillaries is ready to process data.

Navigate to http://localhost:8080, click "New run" and start a new data processing run with the following parameters (no tabs or spaces allowed):

Field Value
Keyspace portfolio_quicktest
Script URI /tmp/capi_cfg/portfolio_quicktest/script.json
Script parameters URI /tmp/capi_cfg/portfolio_quicktest/script_params.json
Start nodes 1_read_accounts,1_read_txns,1_read_period_holdings

A new keyspace portfolio_quicktest will appear in the keyspace list. Click on it and watch the run complete - nodes 7_file_account_period_sector_perf and 7_file_account_year_perf should produce result files:

cat /tmp/capi_out/portfolio_quicktest/account_period_sector_perf.csv
cat /tmp/capi_out/portfolio_quicktest/account_year_perf.csv

For more details about getting started, see Getting started. For more details about this particular demo, see Capillaries blog: Use Capillaries to calculate ARK portfolio performance

Capillaries in depth

What it is and what it is not (with a use case discussion and diagrams)
Getting started (how to run a quick Docker-based demo without compiling a single line of code)
Testing
Toolbelt, Daemon, and Webapi configuration
Script configuration
Capillaries UI
Capillaries API
Capillaries deploy tool: Openstack/AWS cloud deployment
Glossary
Q & A
Capillaries blog
MIT License

(C) 2023 kleines.hertz[at]protonmail.com

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL