Documentation ¶
Overview ¶
Package tsv provides a simple TSV writer which takes care of number->string conversions and tabs, and is far more performant than fmt.Fprintf (thanks to use of strconv.Append{Uint,Float}).
Usage is similar to bufio.Writer, except that in place of the usual Write() method, there are typed WriteString(), WriteUint32(), etc. methods which append one field at a time to the current line, and an EndLine() method to finish the line.
Index ¶
- Constants
- type Reader
- type RowWriter
- type Writer
- func (w *Writer) Copy(r io.Reader) error
- func (w *Writer) EndCsv()
- func (w *Writer) EndLine() (err error)
- func (w *Writer) Flush() error
- func (w *Writer) WriteByte(b byte)
- func (w *Writer) WriteBytes(s []byte)
- func (w *Writer) WriteCsvByte(b byte)
- func (w *Writer) WriteCsvUint32(ui uint32)
- func (w *Writer) WriteFloat64(f float64, fmt byte, prec int)
- func (w *Writer) WriteInt64(i int64)
- func (w *Writer) WritePartialByte(b byte)
- func (w *Writer) WritePartialBytes(s []byte)
- func (w *Writer) WritePartialString(s string)
- func (w *Writer) WritePartialUint32(ui uint32)
- func (w *Writer) WriteString(s string)
- func (w *Writer) WriteUint32(ui uint32)
- func (w *Writer) WriteUint64(ui uint64)
Examples ¶
Constants ¶
const EmptyReadErrStr = "empty file: could not read the header row"
EmptyReadErrStr is the error-string returned by Read() when the file is empty, and at least a header line was expected.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Reader ¶
type Reader struct { *csv.Reader // HasHeaderRow should be set to true to indicate that the input contains a // single header row that lists column names of the rows that follow. It must // be set before reading any data. HasHeaderRow bool // UseHeaderNames causes the reader to set struct fields by matching column // names to struct field names (or `tsv` tag). It must be set before reading // any data. // // If not set, struct fields are filled in order, EVEN IF HasHeaderRow=true. // If set, all struct fields must have a corresponding column in the file or // IgnoreMissingColumns must also be set. An error will be reported through // Read(). // // REQUIRES: HasHeaderRow=true UseHeaderNames bool // RequireParseAllColumns causes Read() report an error if there are columns // not listed in the passed-in struct. It must be set before reading any data. // // REQUIRES: HasHeaderRow=true RequireParseAllColumns bool // IgnoreMissingColumns causes the reader to ignore any struct fields that are // not present as columns in the file. It must be set before reading any // data. // // REQUIRES: HasHeaderRow=true AND UseHeaderNames=true IgnoreMissingColumns bool // contains filtered or unexported fields }
Reader reads a TSV file. It wraps around the standard csv.Reader and allows parsing row contents into a Go struct directly. Thread compatible.
TODO(saito) Support passing a custom bool parser.
TODO(saito) Support a custom "NA" detector.
Example ¶
package main import ( "bytes" "fmt" "io" "github.com/grailbio/base/tsv" ) func main() { type row struct { Key string Col0 uint Col1 float64 } readRow := func(r *tsv.Reader) row { var v row if err := r.Read(&v); err != nil { panic(err) } return v } r := tsv.NewReader(bytes.NewReader([]byte(`Key Col0 Col1 key0 0 0.5 key1 1 1.5 `))) r.HasHeaderRow = true r.UseHeaderNames = true fmt.Printf("%+v\n", readRow(r)) fmt.Printf("%+v\n", readRow(r)) var v row if err := r.Read(&v); err != io.EOF { panic(err) } }
Output: {Key:key0 Col0:0 Col1:0.5} {Key:key1 Col0:1 Col1:1.5}
Example (WithTag) ¶
package main import ( "bytes" "fmt" "io" "github.com/grailbio/base/tsv" ) func main() { type row struct { ColA string `tsv:"key"` ColB float64 `tsv:"col1"` Skipped int `tsv:"-"` ColC int `tsv:"col0,fmt=d"` Hex int `tsv:",fmt=x"` Hyphen int `tsv:"-,"` } readRow := func(r *tsv.Reader) row { var v row if err := r.Read(&v); err != nil { panic(err) } return v } r := tsv.NewReader(bytes.NewReader([]byte(`key col0 col1 Hex - key0 0 0.5 a 1 key1 1 1.5 f 2 `))) r.HasHeaderRow = true r.UseHeaderNames = true fmt.Printf("%+v\n", readRow(r)) fmt.Printf("%+v\n", readRow(r)) var v row if err := r.Read(&v); err != io.EOF { panic(err) } }
Output: {ColA:key0 ColB:0.5 Skipped:0 ColC:0 Hex:10 Hyphen:1} {ColA:key1 ColB:1.5 Skipped:0 ColC:1 Hex:15 Hyphen:2}
func (*Reader) Read ¶
Read reads the next TSV row into a go struct. The argument must be a pointer to a struct. It parses each column in the row into the matching struct fields.
Example:
r := tsv.NewReader(...) ... type row struct { Col0 string Col1 int Float int } var v row err := r.Read(&v)
If !Reader.HasHeaderRow or !Reader.UseHeaderNames, the N-th column (base zero) will be parsed into the N-th field in the struct.
If Reader.HasHeaderRow and Reader.UseHeaderNames, then the struct's field name must match one of the column names listed in the first row in the TSV input. The contents of the column with the matching name will be parsed into the struct field.
By default, the column name is the struct's field name, but you can override it by setting `tsv:"columnname"` tag in the field. The struct tag may also take an fmt option to specify how to parse the value using the fmt package. This is useful for parsing numbers written in a different base. Note that not all verbs are supported with the scanning functions in the fmt package. Using the fmt option may lead to slower performance. Imagine the following row type:
type row struct { Chr string `tsv:"chromo"` Start int `tsv:"pos"` Length int Score int `tsv:"score,fmt=x"` }
and the following TSV file:
| chromo | Length | pos | score | chr1 | 1000 | 10 | 0a | chr2 | 950 | 20 | ff
The first Read() will return row{"chr1", 10, 1000, 10}.
The second Read() will return row{"chr2", 20, 950, 15}.
Embedded structs are supported, and the default column name for nested fields will be the unqualified name of the field.
type RowWriter ¶ added in v0.0.2
type RowWriter struct {
// contains filtered or unexported fields
}
RowWriter writes structs to TSV files using field names or "tsv" tags as TSV column headers.
TODO: Consider letting the caller filter or reorder columns.
Example ¶
package main import ( "bytes" "fmt" "github.com/grailbio/base/tsv" ) func main() { type rowTyp struct { Foo float64 `tsv:"foo,fmt=.2f"` Bar float64 `tsv:"bar,fmt=.3f"` Baz float64 } rows := []rowTyp{ {Foo: 0.1234, Bar: 0.4567, Baz: 0.9876}, {Foo: 1.1234, Bar: 1.4567, Baz: 1.9876}, } var buf bytes.Buffer w := tsv.NewRowWriter(&buf) for i := range rows { if err := w.Write(&rows[i]); err != nil { panic(err) } } if err := w.Flush(); err != nil { panic(err) } fmt.Print(string(buf.Bytes())) }
Output: foo bar Baz 0.12 0.457 0.9876 1.12 1.457 1.9876
func NewRowWriter ¶ added in v0.0.2
NewRowWriter constructs a writer.
User must call Flush() after last Write().
func (*RowWriter) Write ¶ added in v0.0.2
Write writes a TSV row containing the values of v's exported fields. v must be a pointer to a struct.
On first Write, a TSV header row is written using v's type. Subsequent Write()s may pass v of different type, but no guarantees are made about consistent column ordering with different types.
By default, the column name is the struct's field name, but you can override it by setting `tsv:"columnname"` tag in the field.
You can optionally specify an fmt option in the tag which will control how to format the value using the fmt package. Note that the reader may not support all the verbs. Without the fmt option, formatting options are preset for each type. Using the fmt option may lead to slower performance.
Embedded structs are supported, and the default column name for nested fields will be the unqualified name of the field.
type Writer ¶
type Writer struct {
// contains filtered or unexported fields
}
Writer provides an efficient and concise way to append a field at a time to a TSV. However, note that it does NOT have a Write() method; the interface is deliberately restricted.
We force this to fill at least one cacheline to prevent false sharing when make([]Writer, parallelism) is used.
func (*Writer) Copy ¶
Copy appends the entire contents of the given io.Reader (assumed to be another TSV file).
func (*Writer) EndCsv ¶
func (w *Writer) EndCsv()
EndCsv finishes the current comma-separated field, converting the last comma to a tab. It must be nonempty.
func (*Writer) WriteByte ¶
WriteByte appends the given literal byte (no number->string conversion) and a tab to the current line.
func (*Writer) WriteBytes ¶
WriteBytes appends the given []byte and a tab to the current line.
func (*Writer) WriteCsvByte ¶
WriteCsvByte appends the given literal byte (no number->string conversion) and a comma to the current line.
func (*Writer) WriteCsvUint32 ¶
WriteCsvUint32 converts the given uint32 to a string, and appends that and a comma to the current line.
func (*Writer) WriteFloat64 ¶
WriteFloat64 converts the given float64 to a string with the given strconv.AppendFloat parameters, and appends that and a tab to the current line.
func (*Writer) WriteInt64 ¶
WriteInt64 converts the given int64 to a string, and appends that and a tab to the current line.
func (*Writer) WritePartialByte ¶ added in v0.0.11
WritePartialByte appends the given literal byte (no number->string conversion) WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.
func (*Writer) WritePartialBytes ¶
WritePartialBytes appends a []byte WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.
func (*Writer) WritePartialString ¶
WritePartialString appends a string WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.
func (*Writer) WritePartialUint32 ¶
WritePartialUint32 converts the given uint32 to a string, and appends that WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.
func (*Writer) WriteString ¶
WriteString appends the given string and a tab to the current line. (It is safe to use this to write multiple fields at a time.)
func (*Writer) WriteUint32 ¶
WriteUint32 converts the given uint32 to a string, and appends that and a tab to the current line.
func (*Writer) WriteUint64 ¶ added in v0.0.2
WriteUint64 converts the given uint64 to a string, and appends that and a tab to the current line.