Documentation
¶
Overview ¶
Package xsv implements parsing/converting CSV (RFC 4180) and TSV (tab separated values) files to binary ION format.
Index ¶
Constants ¶
const ( TypeIgnore = "ignore" TypeString = "string" // default TypeNumber = "number" // also floating point TypeInt = "int" // integer only TypeBool = "bool" TypeDateTime = "datetime" )
const ( FormatDateTime = "datetime" // default FormatDateTimeUnixSec = "unix_seconds" FormatDateTimeUnixMilliSec = "unix_milli_seconds" FormatDateTimeUnixMicroSec = "unix_micro_seconds" FormatDateTimeUnixNanoSec = "unix_nano_seconds" )
Variables ¶
var ( ErrIngestEmptyOnlyValidForStrings = errors.New("only strings can be empty") ErrFormatOnlyValidForDateTime = errors.New("format only valid for datetime type") ErrBoolValuesOnlyValidForBool = errors.New("custom true/false values only valid for bool type") ErrRequireBothTrueAndFalseValues = errors.New("require both true and false values") ErrTrueAndFalseValuesOverlap = errors.New("true and values values overlap") )
var (
ErrNoHints = errors.New("hints are mandatory")
)
Functions ¶
Types ¶
type CsvChopper ¶
type CsvChopper struct { // SkipRecords allows skipping the first // N records (useful when headers are used) SkipRecords int // Separator allows specifying a custom // separator (defaults to comma) Separator Delim // contains filtered or unexported fields }
CsvChopper reads a CSV formatted file (RFC 4180) and splits each line in the individual fields.
type Delim ¶
type Delim rune
Delim is a rune that unmarshals from a string.
func (*Delim) UnmarshalJSON ¶
UnmarshalJSON implements json.Unmarshaler.
type FieldHint ¶
type FieldHint struct { // Field-name (use dots to make it a subfield) Name string `json:"name,omitempty"` // Type of field (or ignore) Type string `json:"type,omitempty"` // Default value if the column is an empty string Default string `json:"default,omitempty"` // Ingestion format (i.e. different data formats) Format string `json:"format,omitempty"` // Allow empty values (only valid for strings) to // be ingested. If flag is set to false, then the // field won't be written for the record instead. AllowEmpty bool `json:"allow_empty,omitempty"` // Don't use sparse-indexing for this value. // (only valid for date-time type) NoIndex bool `json:"no_index,omitempty"` // Optional list of values that represent TRUE // (only valid for bool type) TrueValues []string `json:"true_values,omitempty"` // Optional list of values that represent FALSE // (only valid for bool type) FalseValues []string `json:"false_values,omitempty"` // Optional list of values that represent a // missing value MissingValues []string `json:"missing_values,omitempty"` // contains filtered or unexported fields }
FieldHint defines if and how a field should be imported
func (*FieldHint) UnmarshalJSON ¶
type Hint ¶
type Hint struct { // SkipRecords allows skipping the first // N records (useful when headers are used) SkipRecords int `json:"skip_records,omitempty"` // Separator allows specifying a custom // separator (only applicable for CSV) Separator Delim `json:"separator,omitempty"` // MissingValues is an optional list of // strings which represent missing values. // Entries in Fields may override this on a // per-field basis. MissingValues []string `json:"missing_values,omitempty"` // Fields specifies the hint for each field Fields []FieldHint `json:"fields"` }
Hint specifies the options and mandatory fields for parsing CSV/TSV files.
func ParseHint ¶
ParseHint parses a json byte array into a Hint structure which can later be used to pass type-hints and/or other flags to the TSV parser.
The input must contain a valid JSON object, like:
{ "fields": [ {"name":"field", "type": "<type>"}, {"name":"field.a", "type": "<type>", "default:" "empty"}, {"name":"field.b", "type": "datetime", "format": "epoch", "no_index": true}, {"name":"anotherField", "type": "bool", "true_values": ["Y"], "false_values": ["N"]}, ... ] }
With TSV each line represents a single record. The tab character is used to split the line into multiple fields. The 'fields' part in the hints is an order list that specify the name and type of each field.
Each field will be given the specified 'name'. If no 'type' is specified then 'string' is assumed. When there are more fields in the data, then in the 'fields', then these are skipped.
If a field doesn't need to be ingested, then you can insert an empty record (or set the 'type' to "ignore" explicitly).
When there is no text between both tabs, the structure won't contain the field, unless a 'default' is specified (can be an empty string). Note that the default value should match the type.
Note that the 'name' can contain multiple levels, so nested objects can be created. This can be useful to group information in the ingested data.
Some values may be included in the sparse index. Set the 'no_index' field to `true` to prevent this behavior for the field.
Supported types:
- string -> set 'allow_empty' if you want empty strings to be ingested
- number -> either float or int
- int
- bool -> can support custom true/false values
- datetime -> formats: text (default), epoch, epoch_ms, epoch_us, epoch_ns
type RowChopper ¶
type RowChopper interface { // GetNext return the next record and // splits fields in individual columns GetNext(r io.Reader) ([]string, error) }
RowChopper implements fetching records row-by-row and chopping the records into individual fields until the reader is exhausted
type TsvChopper ¶
type TsvChopper struct { // SkipRecords allows skipping the first // N records (useful when headers are used) SkipRecords int // contains filtered or unexported fields }
TsvChopper reads a TSV formatted file and splits each line in the individual fields. TSV format differs from CSV, because it doesn't support quoting to allow non-standard characters, but uses escape sequences (i.e. \t, \r or \n)