Documentation ¶
Overview ¶
Package tsv provides a simple TSV writer which takes care of number->string conversions and tabs, and is far more performant than fmt.Fprintf (thanks to use of strconv.Append{Uint,Float}).
Usage is similar to bufio.Writer, except that in place of the usual Write() method, there are typed WriteString(), WriteUint32(), etc. methods which append one field at a time to the current line, and an EndLine() method to finish the line.
Index ¶
- type Reader
- type Writer
- func (w *Writer) Copy(r io.Reader) error
- func (w *Writer) EndCsv()
- func (w *Writer) EndLine() (err error)
- func (w *Writer) Flush() error
- func (w *Writer) WriteByte(b byte)
- func (w *Writer) WriteBytes(s []byte)
- func (w *Writer) WriteCsvByte(b byte)
- func (w *Writer) WriteCsvUint32(ui uint32)
- func (w *Writer) WriteFloat64(f float64, fmt byte, prec int)
- func (w *Writer) WriteInt64(i int64)
- func (w *Writer) WritePartialBytes(s []byte)
- func (w *Writer) WritePartialString(s string)
- func (w *Writer) WritePartialUint32(ui uint32)
- func (w *Writer) WriteString(s string)
- func (w *Writer) WriteUint32(ui uint32)
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Reader ¶
type Reader struct { *csv.Reader // HasHeaderRow should be set to true to indicate that the input contains a // single header row that lists column names of the rows that follow. It must // be set before reading any data. HasHeaderRow bool // UseHeaderNames causes the reader to set struct fields by matching column // names to struct field names (or `tsv` tag). It must be set before reading // any data. // // If not set, struct fields are filled in order, EVEN IF HasHeaderRow=true. // If set, all struct fields must have a corresponding column in the file. // An error will be reported through Read(). // // REQUIRES: HasHeaderRow=true UseHeaderNames bool // RequireParseAllColumns causes Read() report an error if there are columns // not listed in the passed-in struct. It must be set before reading any data. // // REQUIRES: HasHeaderRow=true RequireParseAllColumns bool // contains filtered or unexported fields }
Reader reads a TSV file. It wraps around the standard csv.Reader and allows parsing row contents into a Go struct directly. Thread compatible.
TODO(saito) Support passing a custom bool parser.
TODO(saito) Support a custom "NA" detector.
Example ¶
package main import ( "bytes" "fmt" "io" "github.com/grailbio/base/tsv" ) func main() { type row struct { Key string Col0 uint Col1 float64 } readRow := func(r *tsv.Reader) row { var v row if err := r.Read(&v); err != nil { panic(err) } return v } r := tsv.NewReader(bytes.NewReader([]byte(`Key Col0 Col1 key0 0 0.5 key1 1 1.5 `))) r.HasHeaderRow = true r.UseHeaderNames = true fmt.Printf("%+v\n", readRow(r)) fmt.Printf("%+v\n", readRow(r)) var v row if err := r.Read(&v); err != io.EOF { panic(err) } }
Output: {Key:key0 Col0:0 Col1:0.5} {Key:key1 Col0:1 Col1:1.5}
Example (WithTag) ¶
package main import ( "bytes" "fmt" "io" "github.com/grailbio/base/tsv" ) func main() { type row struct { ColA string `tsv:"key"` ColB float64 `tsv:"col1"` Skipped int `tsv:"-"` ColC int `tsv:"col0"` } readRow := func(r *tsv.Reader) row { var v row if err := r.Read(&v); err != nil { panic(err) } return v } r := tsv.NewReader(bytes.NewReader([]byte(`key col0 col1 key0 0 0.5 key1 1 1.5 `))) r.HasHeaderRow = true r.UseHeaderNames = true fmt.Printf("%+v\n", readRow(r)) fmt.Printf("%+v\n", readRow(r)) var v row if err := r.Read(&v); err != io.EOF { panic(err) } }
Output: {ColA:key0 ColB:0.5 Skipped:0 ColC:0} {ColA:key1 ColB:1.5 Skipped:0 ColC:1}
func (*Reader) Read ¶
Read reads the next TSV row into a go struct. The argument must be a pointer to a struct. It parses each column in the row into the matching struct fields.
Example:
r := tsv.NewReader(...) ... type row struct { col0 string col1 int float int } var v row err := r.Read(&v)
- If !Reader.HasHeaderRow or !Reader.UseHeaderNames, the N-th column (base zero) will be parsed into the N-th field in the struct.
If Reader.HasHeaderRow and Reader.UseHeaderNames, then the struct's field name must match one of the column names listed in the first row in the TSV input. The contents of the column with the matching name will be parsed into the struct field. By default, the column name is the struct's field name, but you can override it by setting `tsv:"columnname"` tag in the field. Imagine the following row type:
type row struct { chr string `tsv:"chromo"` start int `tsv:"pos"` length int }
and the following TSV file:
| chromo | length | pos | chr1 | 1000 | 10 | chr2 | 950 | 20
The first Read() will return row{"chr1", 10, 1000}. The second Read() will return row{"chr2", 20, 950}.
type Writer ¶
type Writer struct {
// contains filtered or unexported fields
}
Writer provides an efficient and concise way to append a field at a time to a TSV. However, note that it does NOT have a Write() method; the interface is deliberately restricted.
We force this to fill at least one cacheline to prevent false sharing when make([]Writer, parallelism) is used.
func (*Writer) Copy ¶
Copy appends the entire contents of the given io.Reader (assumed to be another TSV file).
func (*Writer) EndCsv ¶
func (w *Writer) EndCsv()
EndCsv finishes the current comma-separated field, converting the last comma to a tab. It must be nonempty.
func (*Writer) WriteByte ¶
WriteByte appends the given literal byte (no number->string conversion) and a tab to the current line.
func (*Writer) WriteBytes ¶
WriteBytes appends the given []byte and a tab to the current line.
func (*Writer) WriteCsvByte ¶
WriteCsvByte appends the given literal byte (no number->string conversion) and a comma to the current line.
func (*Writer) WriteCsvUint32 ¶
WriteCsvUint32 converts the given uint32 to a string, and appends that and a comma to the current line.
func (*Writer) WriteFloat64 ¶
WriteFloat64 converts the given float64 to a string with the given strconv.AppendFloat parameters, and appends that and a tab to the current line.
func (*Writer) WriteInt64 ¶
WriteInt64 converts the given int64 to a string, and appends that and a tab to the current line.
func (*Writer) WritePartialBytes ¶
WritePartialBytes appends a []byte WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.
func (*Writer) WritePartialString ¶
WritePartialString appends a string WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.
func (*Writer) WritePartialUint32 ¶
WritePartialUint32 converts the given uint32 to a string, and appends that WITHOUT the usual subsequent tab. It must be followed by a non-Partial Write at some point to end the field; otherwise EndLine will clobber the last character.
func (*Writer) WriteString ¶
WriteString appends the given string and a tab to the current line. (It is safe to use this to write multiple fields at a time.)
func (*Writer) WriteUint32 ¶
WriteUint32 converts the given uint32 to a string, and appends that and a tab to the current line.