csv

package

v0.0.0-...-404dc1e Latest Latest Go to latest Published: Aug 10, 2017 License: Apache-2.0 Imports: 15 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/chrislusf/gleamold

Links

Open Source Insights

Documentation ¶

Overview ¶

Package csv reads and writes comma-separated values (CSV) files.

A csv file contains zero or more records of one or more fields per record. Each record is separated by the newline character. The final record may optionally be followed by a newline character.

field1,field2,field3

White space is considered part of a field.

Carriage returns before newline characters are silently removed.

Blank lines are ignored. A line with only whitespace characters (excluding the ending newline character) is not considered a blank line.

Fields which start and stop with the quote character " are called quoted-fields. The beginning and ending quote are not part of the field.

The source:

normal string,"quoted-field"

results in the fields

{`normal string`, `quoted-field`}

Within a quoted-field a quote character followed by a second quote character is considered a single quote.

"the ""word"" is true","a ""quoted-field"""

results in

{`the "word" is true`, `a "quoted-field"`}

Newlines and commas may be included in a quoted-field

"Multi-line
field","comma is ,"

results in

{`Multi-line
field`, `comma is ,`}

This serves as an example on how to implement a plugin to read external data.

Usually a data set consists of many data shards.

So an input plugin has 3 steps:

generate a list of shard info. this runs on driver.
send each piece of shard info to an remote executor
Each executor fetch external data according to the shard info. Each shard info is processed by a mapper.

The shard info should be serializable/deserializable. Usually just need to use gob to serialize and deserialize it.

Since the mapper to process shard info is in Go, the call to "gio.Init()" is required.

Index ¶

Constants
Variables
type CsvShardInfo
- func (ds *CsvShardInfo) ReadSplit() error
type CsvSource
- func New(fileOrPattern string, partitionCount int) *CsvSource
- func (s *CsvSource) Generate(f *flow.Flow) *flow.Dataset
- func (q *CsvSource) SetHasHeader(hasHeader bool) *CsvSource
type ParseError
- func (e *ParseError) Error() string
type Reader
- func NewReader(r io.Reader) *Reader
- func (r *Reader) Read() (record []string, err error)
- func (r *Reader) ReadAll() (records [][]string, err error)

Constants ¶

View Source

const (
	SINGLE_QUOTE = '\''
	DOUBLE_QUOTE = '"'
)

Variables ¶

View Source

var (
	ErrTrailingComma = errors.New("extra delimiter at end of line") // no longer used
	ErrBareQuote     = errors.New("bare \" in non-quoted-field")
	ErrQuote         = errors.New("extraneous \" in field")
	ErrFieldCount    = errors.New("wrong number of fields in line")
)

These are the errors that can be returned in ParseError.Error

View Source

var (
	MapperReadShard = gio.RegisterMapper(readShard)
)

Functions ¶

This section is empty.

Types ¶

type CsvShardInfo ¶

type CsvShardInfo struct {
	Config    map[string]string
	FileName  string
	HasHeader bool
}

func (*CsvShardInfo) ReadSplit ¶

func (ds *CsvShardInfo) ReadSplit() error

type CsvSource ¶

type CsvSource struct {
	Path           string
	HasHeader      bool
	PartitionCount int
	// contains filtered or unexported fields
}

func New ¶

func New(fileOrPattern string, partitionCount int) *CsvSource

New creates a CsvSource based on a file name. The base file name can have "*", "?" pattern denoting a list of file names.

func (*CsvSource) Generate ¶

func (s *CsvSource) Generate(f *flow.Flow) *flow.Dataset

Generate generates data shard info, partitions them via round robin, and reads each shard on each executor

func (*CsvSource) SetHasHeader ¶

func (q *CsvSource) SetHasHeader(hasHeader bool) *CsvSource

SetHasHeader sets whether the data contains header

type ParseError ¶

type ParseError struct {
	Line   int   // Line where the error occurred
	Column int   // Column (rune index) where the error occurred
	Err    error // The actual error
}

A ParseError is returned for parsing errors. The first line is 1. The first column is 0.

func (*ParseError) Error ¶

func (e *ParseError) Error() string

type Reader ¶

type Reader struct {
	Comma            rune // field delimiter (set to ',' by NewReader)
	Comment          rune // comment character for start of line
	FieldsPerRecord  int  // number of expected fields per record
	LazyQuotes       bool // allow lazy quotes
	TrailingComma    bool // ignored; here for backwards compatibility
	TrimLeadingSpace bool // trim leading space
	// contains filtered or unexported fields
}

A Reader reads records from a CSV-encoded file.

As returned by NewReader, a Reader expects input conforming to RFC 4180. The exported fields can be changed to customize the details before the first call to Read or ReadAll.

Comma is the field delimiter. It defaults to ','.

Comment, if not 0, is the comment character. Lines beginning with the Comment character are ignored.

If FieldsPerRecord is positive, Read requires each record to have the given number of fields. If FieldsPerRecord is 0, Read sets it to the number of fields in the first record, so that future records must have the same field count. If FieldsPerRecord is negative, no check is made and records may have a variable number of fields.

If LazyQuotes is true, a quote may appear in an unquoted field and a non-doubled quote may appear in a quoted field.

If TrimLeadingSpace is true, leading white space in a field is ignored.

func NewReader ¶

func NewReader(r io.Reader) *Reader

NewReader returns a new Reader that reads from r.

func (*Reader) Read ¶

func (r *Reader) Read() (record []string, err error)

Read reads one record from r. The record is a slice of strings with each string representing one field.

func (*Reader) ReadAll ¶

func (r *Reader) ReadAll() (records [][]string, err error)

ReadAll reads all the remaining records from r. Each record is a slice of fields. A successful call returns err == nil, not err == EOF. Because ReadAll is defined to read until EOF, it does not treat end of file as an error to be reported.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL