siser

package module
v0.0.0-...-1b1e84e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 10, 2022 License: BSD-3-Clause Imports: 7 Imported by: 11

README

This has moved to https://github.com/kjk/common (package siser)

Package siser is a Simple Serialization library for Go

Imagine you want to write many records of somewhat structured data to a file. Think of it as structured logging.

You could use csv format, but csv values are identified by a position, not name. They are also hard to read.

You could serialize as json and write one line per json record but json isn't great for human readability (imagine you tail -f a log file with json records).

This library is meant to be a middle ground:

  • you can serialize arbitrary records with multiple key/value pairs
  • the output is human-readable
  • it's designed to be efficient and simple to use

API usage

Imagine you want log basic info about http requests.

func createWriter() (*siser.Writer, error) {
	f, err := os.Create("http_access.log")
	if err != nil {
		return nil, err
	}
	w := siser.NewWriter(f)
	return w, nil
}

func logHTTPRequest(w *siser.Writer, url string, ipAddr string, statusCode int) error {
	var rec siser.Record
	// you can append multiple key/value pairs at once
	rec.Write("url", url, "ipaddr", ipAddr)
	// or assemble with multiple calls
	rec.Writes("code", strconv.Itoa(statusCode))
	_, err := w.WriteRecord(&rec)
	return err
}

The data will be written to writer underlying siser.Writer as:

61 1553488435903 httplog
url: https://blog.kowalczyk.info
ipaddr: 10.0.0.1
code: 200

Here's what and why:

  • 61 is the size of the data. This allows us to read the exact number of bytes in the record
  • 1553488435903 is a timestamp which is Unix epoch time in milliseconds (more precision than standard Unix time which is in seconds)
  • httplog is optional name of the record. This allows you to easily write multiple types of records to a file

To read all records from the file:

f, err := os.Open("http_access.log")
fatalIfErr(err)
defer f.Close()
reader := siser.NewReader(f)
for reader.ReadNextRecord() {
	rec := r.Record
	name := rec.Name // "httplog"
	timestamp := rec.Timestamp
	code, ok := rec.Get("code")
	// get rest of values and and do something with them
}
fatalIfErr(rec.Err())

Usage scenarios

I use siser for in my web services for 2 use cases:

  • logging to help in debugging issues after they happen
  • implementing poor-man's analytics

Logging for debugging adds a little bit more structure over ad hoc logging. I can add some meta-data to log entries and in addition to reading the logs I can quickly write programs that filter the logs. For example if I add serving time to http request log I could easily write a program that shows requests that take over 1 second to serve.

Another one is poor-man's analytics. For example, if you're building a web service that converts .png file to .ico file, it would be good to know daily statistics about how many files were converted, how much time an average conversion takes etc.

Performance and implementation notes

Some implementation decisions were made with performance in mind.

Given key/value nature of the record, an easy choice would be to use map[string]string as source to encode/decode functions.

However []string is more efficient than a map. Additionally, a slice can be reused across multiple records. We can clear it by setting the size to zero and reuse the underlying array. A map would require allocating a new instance for each record, which would create a lot of work for garbage collector.

When serializing, you need to use Reset method to get the benefit of efficient re-use of the Record.

When reading and deserializing records, siser.Reader uses this optimization internally.

The format avoids the need for escaping keys and values, which helps in making encoding/decoding fast.

How does that play out in real life? I wrote a benchmark comparing siser vs. json.Marshal. It’s about 30% faster:

$ go test -bench=.
BenchmarkSiserMarshal-8   	 1000000	      1903 ns/op
BenchmarkJSONMarshal-8    	  500000	      2905 ns/op

The format is binary-safe and works for serializing large values e.g. you can use png image as value.

It’s also very easy to implement in any language.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func TimeFromUnixMillisecond

func TimeFromUnixMillisecond(unixMs int64) time.Time

TimeFromUnixMillisecond returns time from Unix epoch time in milliseconds.

func TimeToUnixMillisecond

func TimeToUnixMillisecond(t time.Time) int64

TimeToUnixMillisecond converts t into Unix epoch time in milliseconds. That's because seconds is not enough precision and nanoseconds is too much.

Types

type Entry

type Entry struct {
	Key   string
	Value string
}

type Reader

type Reader struct {

	// hints that the data was written without a timestamp
	// (see Writer.NoTimestamp). We're permissive i.e. we'll
	// read timestamp if it's written even if NoTimestamp is true
	NoTimestamp bool

	// Record is available after ReadNextRecord().
	// It's over-written in next ReadNextRecord().
	Record *Record

	// Data / Name / Timestampe are available after ReadNextData.
	// They are over-written in next ReadNextData.
	Data      []byte
	Name      string
	Timestamp time.Time

	// position of the current record within the reader.
	// We keep track of it so that callers can index records
	// by offset and seek to it
	CurrRecordPos int64

	// position of the next record within the reader.
	NextRecordPos int64
	// contains filtered or unexported fields
}

Reader is for reading (deserializing) records from a bufio.Reader

func NewReader

func NewReader(r *bufio.Reader) *Reader

NewReader creates a new reader

func (*Reader) Done

func (r *Reader) Done() bool

Done returns true if we're finished reading from the reader

func (*Reader) Err

func (r *Reader) Err() error

Err returns error from last Read. We swallow io.EOF to make it easier to use

func (*Reader) ReadNextData

func (r *Reader) ReadNextData() bool

ReadNextData reads next block from the reader, returns false when no more record. If returns false, check Err() to see if there were errors. After reading Data containst data, and Timestamp and (optional) Name contain meta-data

func (*Reader) ReadNextRecord

func (r *Reader) ReadNextRecord() bool

ReadNextRecord reads a key / value record. Returns false if there are no more record. Check Err() for errors. After reading information is in Record (valid until next read).

type Record

type Record struct {
	// Entries are available after Unmarshal/UnmarshalRecord
	Entries []Entry

	Name string
	// when writing, if not provided we use current time
	Timestamp time.Time
	// contains filtered or unexported fields
}

Record represents list of key/value pairs that can be serialized/deserialized

func UnmarshalRecord

func UnmarshalRecord(d []byte, r *Record) (*Record, error)

UnmarshalRecord unmarshall record as marshalled with Record.Marshal For efficiency re-uses record r. If r is nil, will allocate new record.

func (*Record) Get

func (r *Record) Get(key string) (string, bool)

Get returns a value for a given key

func (*Record) Marshal

func (r *Record) Marshal() []byte

Marshal converts record to bytes

func (*Record) Reset

func (r *Record) Reset()

Reset makes it easy to re-use Record (as opposed to allocating a new one each time)

func (*Record) Unmarshal

func (r *Record) Unmarshal(d []byte) error

Unmarshal resets record and decodes data as created by Marshal into it.

func (*Record) Write

func (r *Record) Write(args ...string)

Write writes key/value pairs to a record. After you write all key/value pairs, call Marshal() to get serialized value (valid until next call to Reset())

type Writer

type Writer struct {

	// NoTimestamp disables writing timestamp, which
	// makes serialized data not depend on when they were written
	NoTimestamp bool
	// contains filtered or unexported fields
}

Writer writes records to in a structured format

func NewWriter

func NewWriter(w io.Writer) *Writer

NewWriter creates a writer

func (*Writer) Write

func (w *Writer) Write(d []byte, t time.Time, name string) (int, error)

Write writes a block of data with optional timestamp and name. Returns number of bytes written (length of d + lenght of metadata) and an error

func (*Writer) WriteRecord

func (w *Writer) WriteRecord(r *Record) (int, error)

WriteRecord writes a record in a specified format

Directories

Path Synopsis
examples module

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL