wordcount

command

v3.0.0-...-7ba4d6b Latest Latest Go to latest Published: Jul 17, 2024 License: Apache-2.0, BSD-3-Clause, MIT Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Beamdust/beam-fork

Links

Open Source Insights

Documentation ¶

Overview ¶

wordcount is an example that counts words in Shakespeare and demonstrates Beam best practices.

This example is the second in a series of four successively more detailed 'word count' examples. You may first want to take a look at minimal_wordcount. After you've looked at this example, see the debugging_wordcount pipeline for introduction of additional concepts.

For a detailed walkthrough of this example, see

https://beam.apache.org/get-started/wordcount-example/

Basic concepts, also in the minimal_wordcount example: reading text files; counting a PCollection; writing to text files.

New concepts:

Executing a pipeline both locally and using the selected runner
Defining your own pipeline options
Using ParDo with static DoFns defined out-of-line
Building a composite transform

Concept #1: You can execute this pipeline either locally or by selecting another runner. These are now command-line options added by the 'beamx' package and not hard-coded as they were in the minimal_wordcount example. The 'beamx' package also registers all included runners and filesystems as a convenience.

To change the runner, specify:

--runner=YOUR_SELECTED_RUNNER

To execute this pipeline, specify a local output file (if using the 'direct' runner) or a remote file on a supported distributed file system.

--output=[YOUR_LOCAL_FILE | YOUR_REMOTE_FILE]

The input file defaults to a public data set containing the text of King Lear by William Shakespeare. You can override it and choose your own input with --input.

Source Files ¶

View all Source files

wordcount.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL