Apache Tour of Beam Contributors Guide
Welcome to the Apache Tour of Beam Contributors Guide! You can contribute to this project in a variety of ways, such as creating new learning materials or runnable examples and updating existing ones, fixing bugs, or improving documentation. This guide is designed to help you contribute by providing examples and learning materials in a variety of supported programming languages. Contributing to this project will allow you to hone your skills in Apache Beam and big data processing while also making valuable contributions to the open-source community.
Learning Materials
All of our Tour of Beam learning materials are organized into a folder structure inside the learning-content folder in the Apache Beam repository. Learning materials are built as a tree-like folder structure following a "course" - "module" - "unit" composition of learning materials, where the course is the tree root and units are the tree leaves. Additional grouping of connected units into separate unit groups is also possible. Each folder contains a *.yaml
metadata files describing the purpose and hierarchy level of the folder and subfolders containing subordinate modules/groups/units. Folders containing units also contain theory and runnable examples as well as hints and solutions for challenges.
Here is a truncated example of the file/folder structure of Tour of Beam learning content:
.
├── README.md
├── content-info.yaml <-- Course metadata
├── introduction <-- Introduction module folder
│ ├── introduction-concepts <-- Concepts group folder
│ │ ├── group-info.yaml <-- Concepts group metadata
│ │ ├── creating-collections <-- Creating collections group folder
│ │ │ ├── group-info.yaml <-- Creating collections group metadata
│ │ │ ├── from-memory <-- Unit folder
│ │ │ │ ├── unit-info.yaml <-- Unit metadata
│ │ │ │ ├── description.md <-- Unit learning materials
│ │ │ │ ├── go-example <-- Golang runnable example folder
│ │ │ │ │ └── from-memory.go <-- Golang runnable example source file
│ │ │ │ ├── java-example <-- Java runnable example folder
│ │ │ │ │ └── from-memory.java <-- Java runnable example source file
│ │ │ │ ├── python-example <-- Python runnable example folder
│ │ │ │ │ └── from-memory.py <-- Python runnable example source file
│ │ │ ├── reading-from-csv
│ │ │ │ ├── description.md
│ │ │ │ ├── go-example
│ │ │ │ │ └── csvExample.go
│ │ │ │ └── unit-info.yaml
│ │ │ └── reading-from-text
│ │ │ ├── description.md
│ │ │ ├── go-example
│ │ │ │ └── textIo.go
│ │ │ └── unit-info.yaml
│ │ ├── pipeline-concepts <-- Pipeline concepts group folder
│ │ │ └── ...
│ │ │
│ │ └── runner-concepts <-- Runner concepts unit folder
│ │ ├── description.md
│ │ └── unit-info.yaml
│ ├── introduction-guide <-- Introduction guide unit folder
│ │ ├── description.md
│ │ └── unit-info.yaml
│ ├── introduction-terms <-- Introduction terms unit folder
│ │ ├── description.md
│ │ └── unit-info.yaml
│ └── module-info.yaml
├── common-transforms <-- Common transforms module folder
├── core-transforms <-- Core transforms module folder
├── windowing <-- Windowing module folder
└── ...
Here, we have a course folder hierarchy with several modules - "Introduction", "Common Transforms", "Core Transforms", "Windowing"... Introduction module is divided into three major parts - "Concepts", "Pipelines" and "Runners", where "Concepts" and "Pipelines" have additional grouping and thus represented as groups, and "Runners" is a simple unit.
This folder structure along with metadata files is used to render a course tree in Tour of Beam.
The YAML metadata files provide information about each module, group or unit. These metadata files are also used to create SDK-specific units.
There are four types of metadata files in Tour of Beam:
- content-info.yaml
- module-info.yaml
- group-info.yaml
- unit-info.yaml
Course metadata is stored in content-info.yaml
file in the root folder ./learning/tour-of-beam/learning-content"
and is represented as a yaml object with following content:
sdk:
- Java
- Python
- Go
content:
- introduction
- windowing
- triggers
sdk: list of programming languages this learning content is intended for (available options are "Go", "Java", "Python").
content: list of folders with modules which will be included in this course
Module metadata is stored in module-info.yaml file in each module folder and is represented as a yaml object with following content:
sdk:
- Java
- Python
- Go
id: introduction
name: Introduction
complexity: BASIC
content:
- introduction-guide
- introduction-concepts
- introduction-terms
sdk: list of programming languages this learning module is intended for (available options are "Go", "Java", "Python").
content: list of folders with groups/units which will be included in this module
id: unique id of the module within the course
name: Short name of the module that will be used in course navigation
complexity: Complexity of the module (available options are BASIC, MEDIUM, ADVANCED)
Group metadata is stored in group-info.yaml file in each group folder and is represented as a yaml object with following content:
sdk:
- Java
- Python
- Go
id: concepts
name: Beam Concepts
content:
- runner-concepts
- pipeline-concepts
- creating-collections
Group metadata is used to group several units into a logical unit. Group metadata has the same structure as module metadata.
Unit metadata serves the purpose of grouping theoretical learning materials with accompanying runnable examples and motivating challenges.
Group metadata is stored in unit-info.yaml file in each unit folder and is represented as a yaml object with following content:
sdk:
- Java
- Python
- Go
id: windowing-motivating-challenge
name: Windowing motivating challenge
taskName: WindowingChallenge
solutionName: WindowingSolution
sdk: list of programming languages this learning unit is intended for (available options are "Go", "Java", "Python").
id: unique id of the unit within the course
name: Short name of the unit that wil be used in course navigation
taskName: Name of runnable example or motivating challenge that is accompanies this unit
solutionName: Name of solution for motivating challenge
Unit material structure
Unit learning materials should be stored in the following documents inside the unit folder:
description.md: Markdown file containing learning unit theoretical materials or description of motivating challenge. All features of markdown are supported in unit learning materials.
hint*.md: Markdown file containing hint for a motivating challenge. There could be up to nine hints for a motivating challenge in files hint1.md
to hint9.md
. Hints are shown sequentially in Tour of Beam.
Multilanguaage template support for learning materials
Learning unit theoretical materials markdown files support go templates for unifying learning materials for different SDK supported languages
{{if (eq .Sdk "go")}}
Learning content that should be displayed only for in Go SDK course tree
{{end}}
Templates also support sections for several programming languages:
{{if (eq .Sdk "go" "java")}}
Learning content that should be displayed only for both Go and Java SDK course tree
{{end}}
Runnable examples and motivating challenges
Each example in this repository is a self-contained code snippet that demonstrates a specific concept or technique. Examples are organized into subfolders by language for convenience, and each example includes embedded tags with metadata that correspond to the information in the YAML metadata files. This makes it easy for contributors to find the examples they need and understand how to use them.
Metadata tag should be embedded in pipeline source file as a commented yaml structure.
There is a degree of flexibility as for where to keep example source files. We propose to keep examples for different SDK's in separate folders. It's also beneficial to separate motivating examples and solutions to separate folders.
Here is a sample of example metadata tag (java comment):
// beam-playground:
// name: KafkaWordCountJson
// description: Test example with Apache Kafka
// multifile: false
// context_line: 55
// categories:
// - Filtering
// - Options
// - Quickstart
// complexity: MEDIUM
// tags:
// - filter
// - strings
// - emulator
// emulators:
// - type: kafka
// topic:
// id: dataset
// source_dataset: CountWords
// datasets:
// CountWords:
// location: local
// format: json
name: Name/Id of the example that should match the taskName or solutionName in unit metadata.
The rest of the fields in yaml metadata are processed with Playground. See Playground README for additional details.
Contributing
We welcome contributions to this repository! If you would like to contribute, please follow these steps:
- Fork the repository
- Create a new branch for your changes
- Make your changes and test them
- Commit your changes and push them to your branch
- Create a pull request
- Check that Example CI workflow status allows merging
- Please make sure that your contributions follow our code of conduct and contributing guidelines.
- New examples/learning materials will be uploaded after PR is merged with a automated CD workflow