gomrjob

package module
v1.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 5, 2024 License: MIT Imports: 18 Imported by: 1

README

GoMRJob

Build Status Docs GitHub release

A Go framework for running Map Reduce Jobs on Hadoop.

http://godoc.org/github.com/jehiah/gomrjob

Supported Configurations
About

This framework has been in production use at Bitly since 2013, but it's light on examples.

See the example for more context.

Heavily inspired by Yelp/mrjob

Documentation

Overview

gomrjob - a Go library for hadoop map reduce jobs

It provides a lightweight framework for writing map and reduce steps as well as a Runner that will submit jobs and put the steps together.

Index

Constants

View Source
const VERSION = "1.1.2"

Variables

This section is empty.

Functions

func Counter

func Counter(group string, counter string, amount int64)

reporter:counter:<group>,<counter>,<amount>

func LoadAndValidateFlags

func LoadAndValidateFlags()

LoadAndValidateFlags loads flags from env and checks for missing arguments

func Status

func Status(message string)

reporter:status:<message>

Types

type Combiner

type Combiner interface {
	Combiner(io.Reader, io.Writer) error
}

type JobType

type JobType int8
const (
	HDFS JobType = iota
	Dataproc
)

type Mapper

type Mapper interface {
	Mapper(io.Reader, io.Writer) error
}

type Reducer

type Reducer interface {
	Reducer(io.Reader, io.Writer) error
}

type Runner

type Runner struct {
	Name  string
	Steps []Step
	// Inputfiles can be of the format `/pattern/to/files*.gz` or `hdfs:///pattern/to/files*.gz` or `s3://bucket/pattern`
	InputFiles         []string
	Output             string // fully qualified
	ReducerTasks       int
	PassThroughOptions []string // CLI arguments to $exe when run as map / reduce tasks
	CompressOutput     bool
	CacheFiles         []string          // -files
	Files              []string          // -file
	Properties         map[string]string // -D key=value argumets to mapreduce-streaming.jar
	JobType            JobType
	// contains filtered or unexported fields
}

func NewRunner

func NewRunner() *Runner

func (*Runner) Cleanup

func (r *Runner) Cleanup() error

func (*Runner) Run

func (r *Runner) Run() error

Run is the program entry point from main()

When executed directly (--stage=”) uploads loads the executibile and submits mapreduce jobs for each stage of the program

func (*Runner) Stage

func (r *Runner) Stage() string

return which stage the runner is executing as

type Step

type Step interface {
	Reducer
}

type StepReducerTasksCount

type StepReducerTasksCount interface {
	NumberReducerTasks() int
}

Directories

Path Synopsis
example module
internal
storage
simple functions for interacting with Google Storage over the JSON API
simple functions for interacting with Google Storage over the JSON API

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL