record

package
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 21, 2025 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

Package scan accepts a bufio.SplitFunc and generalizes batches to non-line oriented input, e.g. XML.

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrTagRequired              = errors.New("tag required")
	ErrGarbledInput             = errors.New("likely gabled input")
	ErrNestedTagsNotImplemented = errors.New("nested tags with the same name not implemented yet")
	ErrOpenTagNotFound          = errors.New("open tag not found")
)

Functions

This section is empty.

Types

type Processor

type Processor struct {
	BatchSize  int
	SplitFunc  bufio.SplitFunc
	NumWorkers int
	Verbose    bool
	R          io.Reader
	W          io.Writer
	F          func([]byte) ([]byte, error)
}

Processor can process lines in parallel.

func NewProcessor

func NewProcessor(r io.Reader, w io.Writer, f func([]byte) ([]byte, error)) *Processor

NewProcessor creates a new line processor.

func (*Processor) Run

func (p *Processor) Run() error

Run starts the workers, crunching through the input.

func (*Processor) Split

func (p *Processor) Split(f bufio.SplitFunc)

Split set the SplitFunc to be used to identify records.

type TagSplitter

type TagSplitter struct {
	// Tag to split on. Nested tags with the same name are not supperted
	// currently (they will cause an error).
	Tag string
	// MaxBytesApprox is the approximate number of bytes in a batch. A batch
	// will always contain at least one element, which may exceed this number.
	// By default, we use 16MB per batch.
	MaxBytesApprox uint
	// contains filtered or unexported fields
}

TagSplitter splits input on XML elements. It will batch content up to approximately MaxBytesApprox bytes. It is guaranteed that each batch contains at least one complete element content.

func (*TagSplitter) Split

func (s *TagSplitter) Split(data []byte, atEOF bool) (advance int, token []byte, err error)

Split accumulates one or more XML element contents and returns a batch of them as a token. This can be used for downstream XML parsing, where the consumer expects a valid XML, that is it contains both start and end tag.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL