The highest tagged major version is v2.

substation

module

v0.6.0 Latest Latest Go to latest Published: Nov 30, 2022 License: MIT

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/brexhq/substation

Links

Open Source Insights

README ¶

Substation

substation logo

Substation is a toolkit for creating highly configurable, no maintenance, and cost efficient serverless data pipelines.

What is Substation?

Originally designed to collect, normalize, and enrich security event data, Substation provides methods for achieving high quality data through interconnected, serverless data pipelines.

Substation also provides Go packages for filtering and modifying JSON data.

Features

As an event-driven ingest, transform, and load application, Substation has these features:

real-time event filtering and processing
cross-dataset event correlation and enrichment
concurrent event routing to downstream systems
runs on containers, built for extensibility
- support for new event filters and processors
- support for new ingest sources and load destinations
- supports creation of custom applications (e.g., multi-cloud)

As a package, Substation has these features:

Use Cases

Substation was originally designed to support the mission of achieving high quality data for threat hunting, threat detection, and incident response, but it can be used to move data between many distributed systems and services. Here are some example use cases:

data availability: sink data to an intermediary streaming service such as AWS Kinesis, then concurrently sink it to a data lake, data warehouse, and SIEM
data consistency: normalize data across every dataset using a permissive schema such as the Elastic Common Schema
data completeness: enrich data by integrating AWS Lambda functions and building self-populating AWS DynamoDB tables for low latency, real-time event context

Example Data Pipelines

Simple

The simplest data pipeline is one with a single source (ingest), a single transform, and a single sink (load). The diagram below shows pipelines that ingest data from different sources and sink it unmodified to a data warehouse where it can be used for analysis.


graph TD
    sink(Data Warehouse)

    %% pipeline one
    source_a(HTTPS Source)
    processing_a[Transfer]

    %% flow
    subgraph pipeline X
    source_a ---|Push| processing_a
    end

    processing_a ---|Push| sink

    %% pipeline two
    source_b(Data Lake)
    processing_b[Transfer]

    %% flow
    subgraph pipeline Y
    source_b ---|Pull| processing_b
    end

    processing_b ---|Push| sink

Complex

The complexity of a data pipeline, including its features and how it connects with other pipelines, is up to the user. The diagram below shows two complex data pipelines that have these feature:

both pipelines write unmodified data to intermediary streaming data storage (e.g., AWS Kinesis) to support concurrent consumers and downstream systems
both pipelines transform data by enriching it from their own inter-pipeline metadata lookup (e.g., AWS DynamoDB)
pipeline Y additionally transforms data by enriching it from pipeline X's metadata lookup


graph TD

    %% pipeline a
    source_a_http(HTTPS Source)
    sink_a_streaming(Streaming Data Storage)
    sink_a_metadata(Metadata Lookup)
    sink_a_persistent[Data Warehouse]
    processing_a_http[Transfer]
    processing_a_persistent[Transform]
    processing_a_metadata[Transform]

    %% flow
    subgraph pipeline Y
    source_a_http ---|Push| processing_a_http
    processing_a_http ---|Push| sink_a_streaming
    sink_a_streaming ---|Pull| processing_a_persistent
    sink_a_streaming ---|Pull| processing_a_metadata
    processing_a_persistent---|Push| sink_a_persistent
    processing_a_persistent---|Pull| sink_a_metadata
    processing_a_metadata ---|Push| sink_a_metadata
    end

    processing_a_persistent ---|Pull| sink_b_metadata

    %% pipeline b
    source_b_http(HTTPS Source)
    sink_b_streaming(Streaming Data Storage)
    sink_b_metadata(Metadata Lookup)
    sink_b_persistent(Data Warehouse)
    processing_b_http[Transfer]
    processing_b_persistent[Transform]
    processing_b_metadata[Transform]

    %% flow
    subgraph pipeline X
    source_b_http ---|Push| processing_b_http
    processing_b_http ---|Push| sink_b_streaming
    sink_b_streaming ---|Pull| processing_b_persistent
    sink_b_streaming ---|Pull| processing_b_metadata
    processing_b_persistent---|Push| sink_b_persistent
    processing_b_persistent---|Pull| sink_b_metadata
    processing_b_metadata ---|Push| sink_b_metadata
    end

As a toolkit, Substation makes no assumptions about how data pipelines are configured and connected. We encourage experimentation and outside-the-box thinking when it comes to pipeline design!

Quickstart

Users can use the steps below to test Substation's functionality. We recommend doing the steps below in a Docker container (we've included Visual Studio Code configurations for developing and testing Substation in .devcontainer/ and .vscode/ ).

Step 0: Set Environment Variable

export SUBSTATION_ROOT=/path/to/repository

Step 1: Compile the File Binary

Run the commands below to compile the Substation file app.

cd $SUBSTATION_ROOT/cmd/file/substation/ && \
go build . && \
./substation -h

Step 2: Compile the quickstart Configuration File

Run the command below to compile the quickstart Jsonnet configuration files into a Substation JSON config.

cd $SUBSTATION_ROOT && \
sh build/scripts/config/compile.sh

Step 3: Test Substation

Run the command below to test Substation.

After this, we recommend reviewing the config documentation and running more tests with other event processors to learn how the app works.

cd $SUBSTATION_ROOT && \
./cmd/file/substation/substation -input examples/quickstart/data.json -config examples/quickstart/config.json

Users can continue exploring the system by iterating on the quickstart config, building and running custom example applications, and deploying a data pipeline in AWS.

Additional Documentation

More documentation about Substation can be found across the project, including:

Licensing

Substation and its associated code is released under the terms of the MIT License.

Directories ¶

Path	Synopsis
cmd package cmd provides definitions and methods for building Substation applications.	package cmd provides definitions and methods for building Substation applications.
aws/lambda/autoscaling
aws/lambda/substation
file/substation
condition
config package config provides capabilities for managing configurations and handling data in Substation applications.	package config provides capabilities for managing configurations and handling data in Substation applications.
examples
condition example from condition/README.md	example from condition/README.md
condition/data example of reading data from a file and applying an inspector	example of reading data from a file and applying an inspector
condition/encapsulation example of reading data from a file and applying an inspector	example of reading data from a file and applying an inspector
process example from process/README.md	example from process/README.md
process/data example of reading data from a file and applying a single processor to data	example of reading data from a file and applying a single processor to data
process/encapsulation example of reading data from a file and applying a single processor to encapsulated data	example of reading data from a file and applying a single processor to encapsulated data
service
internal
aws/appconfig package appconfig provides functions for interacting with AWS AppConfig.	package appconfig provides functions for interacting with AWS AppConfig.
aws/cloudwatch
aws/dynamodb
aws/firehose
aws/kinesis
aws/lambda
aws/s3manager package s3manager provides methods and functions for downloading and uploading objects in AWS S3.	package s3manager provides methods and functions for downloading and uploading objects in AWS S3.
aws/secretsmanager
aws/sqs
aws/ssm
base64
bufio package bufio wraps the standard library's bufio package.	package bufio wraps the standard library's bufio package.
errors
file package file provides functions that can be used to retrieve files from local and remote locations.	package file provides functions that can be used to retrieve files from local and remote locations.
http
json
log Package log wraps logrus and provides global logging only debug logging should be used in condition/, process/, and internal/ to reduce the likelihood of corrupting output for apps debug and info logging can be used in cmd/	Package log wraps logrus and provides global logging only debug logging should be used in condition/, process/, and internal/ to reduce the likelihood of corrupting output for apps debug and info logging can be used in cmd/
media package media provides capabilities for inspecting the content of data and identifying its media (Multipurpose Internet Mail Extensions, MIME) type.	package media provides capabilities for inspecting the content of data and identifying its media (Multipurpose Internet Mail Extensions, MIME) type.
metrics
regexp Package regexp provides a global regexp cache via go-regexpcache	Package regexp provides a global regexp cache via go-regexpcache
service
sink
transform
process
proto
v1beta

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL