tsv

package
v0.0.11 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 7, 2024 License: Apache-2.0 Imports: 10 Imported by: 9

Documentation

Overview

Package tsv provides a simple TSV writer which takes care of number->string conversions and tabs, and is far more performant than fmt.Fprintf (thanks to use of strconv.Append{Uint,Float}).

Usage is similar to bufio.Writer, except that in place of the usual Write() method, there are typed WriteString(), WriteUint32(), etc. methods which append one field at a time to the current line, and an EndLine() method to finish the line.

Index

Examples

Constants

View Source
const EmptyReadErrStr = "empty file: could not read the header row"

EmptyReadErrStr is the error-string returned by Read() when the file is empty, and at least a header line was expected.

Variables

This section is empty.

Functions

This section is empty.

Types

type Reader

type Reader struct {
	*csv.Reader

	// HasHeaderRow should be set to true to indicate that the input contains a
	// single header row that lists column names of the rows that follow.  It must
	// be set before reading any data.
	HasHeaderRow bool

	// UseHeaderNames causes the reader to set struct fields by matching column
	// names to struct field names (or `tsv` tag). It must be set before reading
	// any data.
	//
	// If not set, struct fields are filled in order, EVEN IF HasHeaderRow=true.
	// If set, all struct fields must have a corresponding column in the file or
	// IgnoreMissingColumns must also be set. An error will be reported through
	// Read().
	//
	// REQUIRES: HasHeaderRow=true
	UseHeaderNames bool

	// RequireParseAllColumns causes Read() report an error if there are columns
	// not listed in the passed-in struct. It must be set before reading any data.
	//
	// REQUIRES: HasHeaderRow=true
	RequireParseAllColumns bool

	// IgnoreMissingColumns causes the reader to ignore any struct fields that are
	// not present as columns in the file. It must be set before reading any
	// data.
	//
	// REQUIRES: HasHeaderRow=true AND UseHeaderNames=true
	IgnoreMissingColumns bool
	// contains filtered or unexported fields
}

Reader reads a TSV file. It wraps around the standard csv.Reader and allows parsing row contents into a Go struct directly. Thread compatible.

TODO(saito) Support passing a custom bool parser.

TODO(saito) Support a custom "NA" detector.

Example
package main

import (
	"bytes"
	"fmt"
	"io"

	"github.com/grailbio/base/tsv"
)

func main() {
	type row struct {
		Key  string
		Col0 uint
		Col1 float64
	}

	readRow := func(r *tsv.Reader) row {
		var v row
		if err := r.Read(&v); err != nil {
			panic(err)
		}
		return v
	}

	r := tsv.NewReader(bytes.NewReader([]byte(`Key	Col0	Col1
key0	0	0.5
key1	1	1.5
`)))
	r.HasHeaderRow = true
	r.UseHeaderNames = true
	fmt.Printf("%+v\n", readRow(r))
	fmt.Printf("%+v\n", readRow(r))

	var v row
	if err := r.Read(&v); err != io.EOF {
		panic(err)
	}
}
Output:

{Key:key0 Col0:0 Col1:0.5}
{Key:key1 Col0:1 Col1:1.5}
Example (WithTag)
package main

import (
	"bytes"
	"fmt"
	"io"

	"github.com/grailbio/base/tsv"
)

func main() {
	type row struct {
		ColA    string  `tsv:"key"`
		ColB    float64 `tsv:"col1"`
		Skipped int     `tsv:"-"`
		ColC    int     `tsv:"col0,fmt=d"`
		Hex     int     `tsv:",fmt=x"`
		Hyphen  int     `tsv:"-,"`
	}
	readRow := func(r *tsv.Reader) row {
		var v row
		if err := r.Read(&v); err != nil {
			panic(err)
		}
		return v
	}

	r := tsv.NewReader(bytes.NewReader([]byte(`key	col0	col1	Hex	-
key0	0	0.5	a	1
key1	1	1.5	f	2
`)))
	r.HasHeaderRow = true
	r.UseHeaderNames = true
	fmt.Printf("%+v\n", readRow(r))
	fmt.Printf("%+v\n", readRow(r))

	var v row
	if err := r.Read(&v); err != io.EOF {
		panic(err)
	}
}
Output:

{ColA:key0 ColB:0.5 Skipped:0 ColC:0 Hex:10 Hyphen:1}
{ColA:key1 ColB:1.5 Skipped:0 ColC:1 Hex:15 Hyphen:2}

func NewReader

func NewReader(in io.Reader) *Reader

NewReader creates a new TSV reader that reads from the given input.

func (*Reader) Read

func (r *Reader) Read(v interface{}) error

Read reads the next TSV row into a go struct. The argument must be a pointer to a struct. It parses each column in the row into the matching struct fields.

Example:

 r := tsv.NewReader(...)
 ...
 type row struct {
   Col0 string
   Col1 int
   Float int
}
var v row
err := r.Read(&v)

If !Reader.HasHeaderRow or !Reader.UseHeaderNames, the N-th column (base zero) will be parsed into the N-th field in the struct.

If Reader.HasHeaderRow and Reader.UseHeaderNames, then the struct's field name must match one of the column names listed in the first row in the TSV input. The contents of the column with the matching name will be parsed into the struct field.

By default, the column name is the struct's field name, but you can override it by setting `tsv:"columnname"` tag in the field. The struct tag may also take an fmt option to specify how to parse the value using the fmt package. This is useful for parsing numbers written in a different base. Note that not all verbs are supported with the scanning functions in the fmt package. Using the fmt option may lead to slower performance. Imagine the following row type:

type row struct {
   Chr    string `tsv:"chromo"`
   Start  int    `tsv:"pos"`
   Length int
   Score  int    `tsv:"score,fmt=x"`
}

and the following TSV file:

| chromo | Length | pos | score
| chr1   | 1000   | 10  | 0a
| chr2   | 950    | 20  | ff

The first Read() will return row{"chr1", 10, 1000, 10}.

The second Read() will return row{"chr2", 20, 950, 15}.

Embedded structs are supported, and the default column name for nested fields will be the unqualified name of the field.

type RowWriter added in v0.0.2

type RowWriter struct {
	// contains filtered or unexported fields
}

RowWriter writes structs to TSV files using field names or "tsv" tags as TSV column headers.

TODO: Consider letting the caller filter or reorder columns.

Example
package main

import (
	"bytes"
	"fmt"

	"github.com/grailbio/base/tsv"
)

func main() {
	type rowTyp struct {
		Foo float64 `tsv:"foo,fmt=.2f"`
		Bar float64 `tsv:"bar,fmt=.3f"`
		Baz float64
	}
	rows := []rowTyp{
		{Foo: 0.1234, Bar: 0.4567, Baz: 0.9876},
		{Foo: 1.1234, Bar: 1.4567, Baz: 1.9876},
	}
	var buf bytes.Buffer
	w := tsv.NewRowWriter(&buf)
	for i := range rows {
		if err := w.Write(&rows[i]); err != nil {
			panic(err)
		}
	}
	if err := w.Flush(); err != nil {
		panic(err)
	}
	fmt.Print(string(buf.Bytes()))

}
Output:

foo	bar	Baz
0.12	0.457	0.9876
1.12	1.457	1.9876

func NewRowWriter added in v0.0.2

func NewRowWriter(w io.Writer) *RowWriter

NewRowWriter constructs a writer.

User must call Flush() after last Write().

func (*RowWriter) Flush added in v0.0.2

func (w *RowWriter) Flush() error

Flush flushes all previously-written rows.

func (*RowWriter) Write added in v0.0.2

func (w *RowWriter) Write(v interface{}) error

Write writes a TSV row containing the values of v's exported fields. v must be a pointer to a struct.

On first Write, a TSV header row is written using v's type. Subsequent Write()s may pass v of different type, but no guarantees are made about consistent column ordering with different types.

By default, the column name is the struct's field name, but you can override it by setting `tsv:"columnname"` tag in the field.

You can optionally specify an fmt option in the tag which will control how to format the value using the fmt package. Note that the reader may not support all the verbs. Without the fmt option, formatting options are preset for each type. Using the fmt option may lead to slower performance.

Embedded structs are supported, and the default column name for nested fields will be the unqualified name of the field.

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

Writer provides an efficient and concise way to append a field at a time to a TSV. However, note that it does NOT have a Write() method; the interface is deliberately restricted.

We force this to fill at least one cacheline to prevent false sharing when make([]Writer, parallelism) is used.

func NewWriter

func NewWriter(w io.Writer) (tw *Writer)

NewWriter creates a new tsv.Writer from an io.Writer.

func (*Writer) Copy

func (w *Writer) Copy(r io.Reader) error

Copy appends the entire contents of the given io.Reader (assumed to be another TSV file).

func (*Writer) EndCsv

func (w *Writer) EndCsv()

EndCsv finishes the current comma-separated field, converting the last comma to a tab. It must be nonempty.

func (*Writer) EndLine

func (w *Writer) EndLine() (err error)

EndLine finishes the current line. It must be nonempty.

func (*Writer) Flush

func (w *Writer) Flush() error

Flush flushes all finished lines.

func (*Writer) WriteByte

func (w *Writer) WriteByte(b byte)

WriteByte appends the given literal byte (no number->string conversion) and a tab to the current line.

func (*Writer) WriteBytes

func (w *Writer) WriteBytes(s []byte)

WriteBytes appends the given []byte and a tab to the current line.

func (*Writer) WriteCsvByte

func (w *Writer) WriteCsvByte(b byte)

WriteCsvByte appends the given literal byte (no number->string conversion) and a comma to the current line.

func (*Writer) WriteCsvUint32

func (w *Writer) WriteCsvUint32(ui uint32)

WriteCsvUint32 converts the given uint32 to a string, and appends that and a comma to the current line.

func (*Writer) WriteFloat64

func (w *Writer) WriteFloat64(f float64, fmt byte, prec int)

WriteFloat64 converts the given float64 to a string with the given strconv.AppendFloat parameters, and appends that and a tab to the current line.

func (*Writer) WriteInt64

func (w *Writer) WriteInt64(i int64)

WriteInt64 converts the given int64 to a string, and appends that and a tab to the current line.

func (*Writer) WritePartialByte added in v0.0.11

func (w *Writer) WritePartialByte(b byte)

WritePartialByte appends the given literal byte (no number->string conversion) WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.

func (*Writer) WritePartialBytes

func (w *Writer) WritePartialBytes(s []byte)

WritePartialBytes appends a []byte WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.

func (*Writer) WritePartialString

func (w *Writer) WritePartialString(s string)

WritePartialString appends a string WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.

func (*Writer) WritePartialUint32

func (w *Writer) WritePartialUint32(ui uint32)

WritePartialUint32 converts the given uint32 to a string, and appends that WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.

func (*Writer) WriteString

func (w *Writer) WriteString(s string)

WriteString appends the given string and a tab to the current line. (It is safe to use this to write multiple fields at a time.)

func (*Writer) WriteUint32

func (w *Writer) WriteUint32(ui uint32)

WriteUint32 converts the given uint32 to a string, and appends that and a tab to the current line.

func (*Writer) WriteUint64 added in v0.0.2

func (w *Writer) WriteUint64(ui uint64)

WriteUint64 converts the given uint64 to a string, and appends that and a tab to the current line.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL