textio

package
v0.0.0-...-574e91b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 16, 2024 License: Apache-2.0 Imports: 10 Imported by: 0

Documentation

Overview

Package textio contains transforms for reading and writing text blobs.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Immediate

func Immediate(s *beam.Scope, filename string) (beam.PCol[string], error)

Immediate reads a local file at pipeline construction-time and embeds the data into a I/O-free pipeline source. Should be used for small files only.

func Read

func Read(s *beam.Scope, bucket, glob string, opts ...ReadOptionFn) beam.PCol[string]

Read reads a set of files indicated by the glob pattern and returns the lines as a PCollection<string>. The newlines are not part of the lines. Read accepts a variadic number of ReadOptionFn that can be used to configure the compression type of the file. By default, the compression type is determined by the file extension.

func ReadAll

func ReadAll(s *beam.Scope, col beam.PCol[beam.KV[string, string]], opts ...ReadOptionFn) beam.PCol[string]

ReadAll expands and reads the filename given as globs by the incoming PCollection<string>. It returns the lines of all files as a single PCollection<string>. The newlines are not part of the lines. ReadAll accepts a variadic number of ReadOptionFn that can be used to configure the compression type of the files. By default, the compression type is determined by the file extension.

func ReadWithFilename

func ReadWithFilename(s *beam.Scope, bucket, glob string, opts ...ReadOptionFn) beam.PCol[beam.KV[string, string]]

ReadWithFilename reads a set of files indicated by the glob pattern and returns a PCollection<KV<string, string>> of each filename and line. The newlines are not part of the lines. ReadWithFilename accepts a variadic number of ReadOptionFn that can be used to configure the compression type of the files. By default, the compression type is determined by the file extension.

func WriteSingle

func WriteSingle(s *beam.Scope, bucket, filename string, col beam.PCol[string]) beam.PCol[string]

WriteSingle writes a PCollection<string> to a single blob key as separate lines. The writer add a newline after each element.

Intended for very small scale writes, as it doesn't shard large files.

Emits written files.

Types

type ReadOptionFn

type ReadOptionFn func(*readOption)

ReadOptionFn is a function that can be passed to Read or ReadAll to configure options for reading files.

func ReadAutoCompression

func ReadAutoCompression() ReadOptionFn

ReadAutoCompression specifies that the compression type of files should be auto-detected.

func ReadGzip

func ReadGzip() ReadOptionFn

ReadGzip specifies that files have been compressed using gzip.

func ReadUncompressed

func ReadUncompressed() ReadOptionFn

ReadUncompressed specifies that files have not been compressed.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL