protoscan

package module
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 7, 2022 License: MIT Imports: 4 Imported by: 9

README

protoscan CI codecov Go Report Card Godoc Reference

Package protoscan is a low-level reader for protocol buffers encoded data in Golang. The main feature is the support for lazy/conditional decoding of fields.

This library can help decoding performance in two ways:

  1. fields can be conditionally decoded, skipping over fields that are not needed for a specific use-case,

  2. decoding directly into specific types or perform other transformations, the extra state can be skipped by manually decoding into the types directly.

Please be aware that to decode an entire message it is still faster to use gogoprotobuf. After much testing I think this is due to the generated code inlining almost all code to eliminate the function call overhead.

Warning: Writing code with this library is like writing the auto-generated protobuf decoder and is very time-consuming. It should only be used for specific use cases and for stable protobuf definitions.

Usage

First, the encoded protobuf data is used to initialize a new Message. Then you iterate over the fields, reading or skipping them.

msg := protoscan.New(encodedData)
for msg.Next() {
    switch msg.FieldNumber() {
    case 1: // an int64 type
        v, err := msg.Int64()
        if err != nil {
            // handle
        }

    case 3: // repeated number types can be returned as a slice
        ids, err := msg.RepeatedInt64(nil)
        if err != nil {
            // handle
        }

    case 2: // for more control repeated+packed fields can be read using an iterator
        iter, err := msg.Iterator(nil)
        if err != nil {
            // handle
        }

        userIDs := make([]UserID, 0, iter.Count(protoscan.WireTypeVarint))
        for iter.HasNext() {
            v, err := iter.Int64()
            if err != nil {
                // handle
            }

            userIDs = append(userIDs, UserID(v))
        }
    default:
        msg.Skip() // required if value not needed.
    }
}

if msg.Err() != nil {
    // handle
}

After calling Next() you MUST call an accessor function (Int64(), RepeatedInt64(), Iterator(), etc.) or Skip() to ignore the field. All these functions, including Next() and Skip(), must not be called twice in a row.

Value Accessor Functions

There is an accessor for each one the protobuf scalar value types.

For repeated fields there is a corresponding set of functions like RepeatedInt64(buf []int64) ([]int64, error). Repeated fields may or may not be packed, so you should pass in a pre-created buffer variable when calling. For example

var ids []int64

msg := protoscan.New(encodedData)
for msg.Next() {
    switch msg.FieldNumber() {
    case 1: // repeated int64 field
        var err error
        ids, err = msg.RepeatedInt64(ids)
        if err != nil {
            // handle
        }
    default:
        msg.Skip()
    }
}

if msg.Err() != nil {
    // handle
}

If the ids are 'packed', RepeatedInt64() will be called once. If the ids are simply repeated RepeatedInt64() will be called N times, but the resulting array of ids will be the same.

For more control over the values in a packed, repeated field use an Iterator. See above for an example.

Decoding Embedded Messages

Embedded messages can be handled recursively, or the raw data can be returned and decoded using a standard/auto-generated proto.Unmarshal function.

msg := protoscan.New(encodedData)
for msg.Next() {
    fn := msg.FieldNumber()

    // use protoscan recursively
    if fn == 1 && needFieldNumber1 {
        embeddedMsg, err := msg.Message()
        for embeddedMsg.Next() {
            switch embeddedMsg.FieldNumber() {
            case 1:
                // do something
            default:
                embeddedMsg.Skip()
            }
        }
    }

    // if you need the whole message decode the message in the standard way.
    if fn == 2 && needFieldNumber2 {
        data, err := msg.MessageData()

        v := &ProtoBufThing()
        err = proto.Unmarshal(data, v)
    }
}
Handling errors

For Errors can occure for two reason:

  1. The field is being read as the incorrect type.
  2. The data is corrupted or somehow invalid.

Larger Example

Starting with a customer message with embedded orders and items and you only want to count the number of items in open orders.

message Customer {
  required int64 id = 1;
  optional string username = 2;

  repeated Order orders = 3;
  repeated int64 favorite_ids = 4 [packed=true];
}

message Order {
  required int64 id = 1;
  required bool open = 2;
  repeated Item items = 3;
}

message Item {
  // a big object
}

Sample Code:

openCount := 0
itemCount := 0
favoritesCount := 0

customer := protoscan.New(data)
for customer.Next() {
    switch customer.FieldNumber() {
    case 1: // id
        id, err := customer.Int64()
        if err != nil {
            panic(err)
        }
        _ = id // do something or skip this case if not needed

    case 2: // username
        username, err := customer.String()
        if err != nil {
            panic(err)
        }
        _ = username // do something or skip this case if not needed

    case 3: // orders
        open := false
        count := 0

        orderData, _ := customer.MessageData()
        order := protoscan.New(orderData)
        for order.Next() {
            switch order.FieldNumber() {
            case 2: // open
                v, _ := order.Bool()
                open = v
            case 3: // item
                count++

                // we're not reading the data but we still need to skip it.
                order.Skip()
            default:
                // required to move past unneeded fields
                order.Skip()
            }
        }

        if open {
            openCount++
            itemCount += count
        }
    case 4: // favorite ids
        iter, err := customer.Iterator(nil)
        if err != nil {
        	panic(err)
        }

        // Typically this section would only be run once but it is valid
        // protobuf to contain multiple sections of repeated fields that should
        // be concatenated together.
        favoritesCount += iter.Count(protoscan.WireTypeVarint)
    default:
        // unread fields must be skipped
        customer.Skip()
    }
}

fmt.Printf("Open Orders: %d\n", openCount)
fmt.Printf("Items:       %d\n", itemCount)
fmt.Printf("Favorites:   %d\n", favoritesCount)

// Output:
// Open Orders: 2
// Items:       4
// Favorites:   8

Wire Type Start Group and End Group

Groups are an old protobuf wire type that has been deprecated for a long time. They function as parentheses but with no "data length" information so their content can not be effectively skipped. Just the start and end group indicators can be read and skipped like any other field. This would cause the data to be read without the parentheses, whatever that may mean in practice. To get the raw protobuf data inside a group try something like:

var (
	groupFieldNum = 123
	groupData []byte
)

msg := New(data)
for msg.Next() {
	if msg.FieldNumber() == groupFieldNum && msg.WireType() == WireTypeStartGroup {
		start, end := msg.Index, msg.Index
		for msg.Next() {
			msg.Skip()
			if msg.FieldNumber() == groupFieldNum && msg.WireType() == WireTypeEndGroup {
				break
			}
			end = msg.Index
		}
		// groupData would be the raw protobuf encoded bytes of the fields in the group.
		groupData = msg.Data[start:end]
	}
}

Similar libraries in other languages

  • protozero - C++, the inspiration for this library
  • pbf - javascript

Documentation

Overview

Example (Count)

ExampleCount demonstrates some basics of using the library by counting elements in a larger customer message without fully decoding it.

package main

import (
	"fmt"

	"github.com/paulmach/protoscan"
	"github.com/paulmach/protoscan/internal/testmsg"
	"google.golang.org/protobuf/proto"
)

func main() {
	c := &testmsg.Customer{
		Id:       proto.Int64(123),
		Username: proto.String("name"),
		Orders: []*testmsg.Order{
			{
				Id:   proto.Int64(1),
				Open: proto.Bool(true),
				Items: []*testmsg.Item{
					{Id: proto.Int64(1)},
					{Id: proto.Int64(2)},
					{Id: proto.Int64(3)},
				},
			},
			{
				Id:   proto.Int64(2),
				Open: proto.Bool(false),
				Items: []*testmsg.Item{
					{Id: proto.Int64(1)},
					{Id: proto.Int64(2)},
				},
			},
			{
				Id:   proto.Int64(3),
				Open: proto.Bool(true),
				Items: []*testmsg.Item{
					{Id: proto.Int64(1)},
				},
			},
		},
		FavoriteIds: []int64{1, 2, 3, 4, 5, 6, 7, 8},
	}
	data, _ := proto.Marshal(c)

	// start the decoding
	openCount := 0
	itemCount := 0
	favoritesCount := 0

	customer := protoscan.New(data)
	for customer.Next() {
		switch customer.FieldNumber() {
		case 1: // id
			id, err := customer.Int64()
			if err != nil {
				panic(err)
			}
			_ = id // do something or skip this case if not needed

		case 2: // username
			username, err := customer.String()
			if err != nil {
				panic(err)
			}
			_ = username // do something or skip this case if not needed

		case 3: // orders
			open := false
			count := 0

			orderData, _ := customer.MessageData()
			order := protoscan.New(orderData)
			for order.Next() {
				switch order.FieldNumber() {
				case 2: // open
					v, _ := order.Bool()
					open = v
				case 3: // item
					count++

					// we're not reading the data but we still need to skip it.
					order.Skip()
				default:
					// required to move past unneeded fields
					order.Skip()
				}
			}

			if open {
				openCount++
				itemCount += count
			}
		case 4: // favorite ids
			iter, err := customer.Iterator(nil)
			if err != nil {
				panic(err)
			}

			// Typically this section would only be run once but it is valid
			// protobuf to contain multiple sections of repeated fields that should
			// be concatenated together.
			favoritesCount += iter.Count(protoscan.WireTypeVarint)
		default:
			// unread fields must be skipped
			customer.Skip()
		}
	}

	if customer.Err() != nil {
		panic(customer.Err())
	}

	fmt.Printf("Open Orders: %d\n", openCount)
	fmt.Printf("Items:       %d\n", itemCount)
	fmt.Printf("Favorites:   %d\n", favoritesCount)

}
Output:

Open Orders: 2
Items:       4
Favorites:   8
Example (EmptyGroup)
data := []byte{}

data = protowire.AppendTag(data, 100, WireType64bit)
data = protowire.AppendFixed64(data, 100_100_100)
data = protowire.AppendTag(data, 200, WireTypeStartGroup)
data = protowire.AppendTag(data, 200, WireTypeEndGroup)
data = protowire.AppendTag(data, 400, WireTypeVarint)
data = protowire.AppendVarint(data, 100_100)

var (
	groupFieldNum = 200
	groupData     []byte
)

msg := New(data)
for msg.Next() {
	if msg.FieldNumber() == groupFieldNum && msg.WireType() == WireTypeStartGroup {
		start := msg.Index
		end := msg.Index
		for msg.Next() {
			msg.Skip()
			if msg.FieldNumber() == groupFieldNum && msg.WireType() == WireTypeEndGroup {
				break
			}
			end = msg.Index
		}
		// groupData would be the raw protobuf encoded bytes of the fields in the group.
		groupData = msg.Data[start:end]
	}
}

fmt.Printf("data length: %d\n", len(data))
fmt.Printf("group data length: %v\n", len(groupData))
Output:

data length: 19
group data length: 0
Example (Groups)
data := []byte{}

data = protowire.AppendTag(data, 200, WireTypeStartGroup)
data = protowire.AppendTag(data, 300, WireType64bit)
data = protowire.AppendFixed64(data, 100_100_100)
data = protowire.AppendTag(data, 400, WireTypeVarint)
data = protowire.AppendVarint(data, 100_100)
data = protowire.AppendTag(data, 200, WireTypeEndGroup)

var (
	groupFieldNum = 200
	groupData     []byte
)

msg := New(data)
for msg.Next() {
	if msg.FieldNumber() == groupFieldNum && msg.WireType() == WireTypeStartGroup {
		start := msg.Index
		end := msg.Index
		for msg.Next() {
			msg.Skip()
			if msg.FieldNumber() == groupFieldNum && msg.WireType() == WireTypeEndGroup {
				break
			}
			end = msg.Index
		}
		// groupData would be the raw protobuf encoded bytes of the fields in the group.
		groupData = msg.Data[start:end]
	}
}

fmt.Printf("data length: %d\n", len(data))
fmt.Printf("group data length: %v\n", len(groupData))
Output:

data length: 19
group data length: 15

Index

Examples

Constants

View Source
const (
	WireTypeVarint          = 0
	WireType64bit           = 1
	WireTypeLengthDelimited = 2
	WireTypeStartGroup      = 3 // deprecated by protobuf, not supported
	WireTypeEndGroup        = 4 // deprecated by protobuf, not supported
	WireType32bit           = 5
)

The WireType describes the encoding method for the next value in the stream.

Variables

View Source
var ErrIntOverflow = errors.New("protoscan: integer overflow")

ErrIntOverflow is returned when scanning an integer with varint encoding and the value is too long for the integer type.

View Source
var ErrInvalidLength = errors.New("protoscan: invalid length")

ErrInvalidLength is returned when a length is not valid, usually resulting from scanning the incorrect type.

Functions

This section is empty.

Types

type Iterator

type Iterator struct {
	// contains filtered or unexported fields
}

An Iterator allows for moving across a packed repeated field in a 'controlled' fashion.

func (*Iterator) Bool

func (b *Iterator) Bool() (bool, error)

Bool is encoded as 0x01 or 0x00 plus the field+type prefix byte. 2 bytes total.

func (*Iterator) Count

func (i *Iterator) Count(wireType int) int

Count returns the total number of values in this repeated field. The answer depends on the type/encoding or the field: double, float, fixed, sfixed are WireType32bit or WireType64bit, all others int, uint, sint types are WireTypeVarint. The function will panic for any other value.

func (*Iterator) Double

func (b *Iterator) Double() (float64, error)

Double values are encoded as a fixed length of 8 bytes in their IEEE-754 format.

func (*Iterator) FieldNumber

func (i *Iterator) FieldNumber() int

FieldNumber returns the number for the current repeated field. These numbers are defined in the protobuf definition file used to encode the message.

func (*Iterator) Fixed32

func (b *Iterator) Fixed32() (uint32, error)

Fixed32 reads a fixed 4 byte value as a uint32. This proto type is more efficient than uint32 if values are often greater than 2^28.

func (*Iterator) Fixed64

func (b *Iterator) Fixed64() (uint64, error)

Fixed64 reads a fixed 8 byte value as an uint64. This proto type is more efficient than uint64 if values are often greater than 2^56.

func (*Iterator) Float

func (b *Iterator) Float() (float32, error)

Float values are encoded as a fixed length of 4 bytes in their IEEE-754 format.

func (*Iterator) HasNext

func (i *Iterator) HasNext() bool

HasNext is used in a 'for' loop to read through all the elements. Returns false when all the items have been read. This method does NOT need to be called, reading a value automatically moves in the index forward. This behavior is different than Message.Next().

func (*Iterator) Int32

func (b *Iterator) Int32() (int32, error)

Int32 reads a variable-length encoding of up to 4 bytes. This field type is best used if the field only has positive numbers, otherwise use sint32. Note, this field can also by read as an Int64.

func (*Iterator) Int64

func (b *Iterator) Int64() (int64, error)

Int64 reads a variable-length encoding of up to 8 bytes. This field type is best used if the field only has positive numbers, otherwise use sint64.

func (*Iterator) Sfixed32

func (b *Iterator) Sfixed32() (int32, error)

Sfixed32 reads a fixed 4 byte value signed value.

func (*Iterator) Sfixed64

func (b *Iterator) Sfixed64() (int64, error)

Sfixed64 reads a fixed 8 byte signed value.

func (*Iterator) Sint32

func (b *Iterator) Sint32() (int32, error)

Sint32 uses variable-length encoding with zig-zag encoding for signed values. This field type more efficiently encodes negative numbers than regular int32s.

func (*Iterator) Sint64

func (b *Iterator) Sint64() (int64, error)

Sint64 uses variable-length encoding with zig-zag encoding for signed values. This field type more efficiently encodes negative numbers than regular int64s.

func (*Iterator) Skip added in v0.2.0

func (i *Iterator) Skip(wireType int, count int)

Skip will move the interator forward 'count' value(s) without actually reading it. Must provide the correct wireType. For a new iterator 'count' will move the pointer so the next value call with be the 'counth' value. double, float, fixed, sfixed are WireType32bit or WireType64bit, all others int, uint, sint types are WireTypeVarint. The function will panic for any other value.

func (*Iterator) Uint32

func (b *Iterator) Uint32() (uint32, error)

Uint32 reads a variable-length encoding of up to 4 bytes.

func (*Iterator) Uint64

func (b *Iterator) Uint64() (uint64, error)

Uint64 reads a variable-length encoding of up to 8 bytes.

func (*Iterator) Varint32

func (b *Iterator) Varint32() (uint32, error)

Varint32 reads up to 32-bits of variable-length encoded data. Note that negative int32 values could still be encoded as 64-bit varints due to their leading 1s.

func (*Iterator) Varint64

func (b *Iterator) Varint64() (uint64, error)

Varint64 reads up to 64-bits of variable-length encoded data.

type Message

type Message struct {
	// contains filtered or unexported fields
}

Message is a container for a protobuf message type that is ready for scanning.

func New

func New(data []byte) *Message

New creates a new Message scanner for the given encoded protobuf data.

func (*Message) Bool

func (b *Message) Bool() (bool, error)

Bool is encoded as 0x01 or 0x00 plus the field+type prefix byte. 2 bytes total.

func (*Message) Bytes

func (m *Message) Bytes() ([]byte, error)

Bytes returns the encode sequence of bytes. NOTE: this value is NOT copied.

func (*Message) Double

func (b *Message) Double() (float64, error)

Double values are encoded as a fixed length of 8 bytes in their IEEE-754 format.

func (*Message) Err

func (m *Message) Err() error

Err will return any errors that were encountered during scanning. Errors could be due to reading the incorrect types or forgetting to skip and unused value.

func (*Message) FieldNumber

func (m *Message) FieldNumber() int

FieldNumber returns the number for the current value being scanned. These numbers are defined in the protobuf definition file used to encode the message.

func (*Message) Fixed32

func (b *Message) Fixed32() (uint32, error)

Fixed32 reads a fixed 4 byte value as a uint32. This proto type is more efficient than uint32 if values are often greater than 2^28.

func (*Message) Fixed64

func (b *Message) Fixed64() (uint64, error)

Fixed64 reads a fixed 8 byte value as an uint64. This proto type is more efficient than uint64 if values are often greater than 2^56.

func (*Message) Float

func (b *Message) Float() (float32, error)

Float values are encoded as a fixed length of 4 bytes in their IEEE-754 format.

func (*Message) Int32

func (b *Message) Int32() (int32, error)

Int32 reads a variable-length encoding of up to 4 bytes. This field type is best used if the field only has positive numbers, otherwise use sint32. Note, this field can also by read as an Int64.

func (*Message) Int64

func (b *Message) Int64() (int64, error)

Int64 reads a variable-length encoding of up to 8 bytes. This field type is best used if the field only has positive numbers, otherwise use sint64.

func (*Message) Iterator

func (m *Message) Iterator(iter *Iterator) (*Iterator, error)

Iterator will use the current field. The field must be a packed repeated field.

func (*Message) Message

func (m *Message) Message(msg *Message) (*Message, error)

Message will return a pointer to an embedded message that can then be scanned in kind of a recursive fashion. Will reuse the provided Message object if provided.

func (*Message) MessageData

func (m *Message) MessageData() ([]byte, error)

MessageData returns the encoded data a message. This data can then be decoded using conventional tools.

func (*Message) Next

func (m *Message) Next() bool

Next will move the scanner to the next value. This function should be used in a for loop.

for msg.Next() {
  switch msg.FieldNumber() {
  case 1:
    v, err := msg.Float()
  default:
    msg.Skip()
  }
}

func (*Message) RepeatedBool

func (m *Message) RepeatedBool(buf []bool) ([]bool, error)

RepeatedBool will append the repeated value(s) to the buffer. This method supports packed or unpacked encoding.

func (*Message) RepeatedDouble

func (m *Message) RepeatedDouble(buf []float64) ([]float64, error)

RepeatedDouble will append the repeated value(s) to the buffer. This method supports packed or unpacked encoding.

func (*Message) RepeatedFixed32

func (m *Message) RepeatedFixed32(buf []uint32) ([]uint32, error)

RepeatedFixed32 will append the repeated value(s) to the buffer. This method supports packed or unpacked encoding.

func (*Message) RepeatedFixed64

func (m *Message) RepeatedFixed64(buf []uint64) ([]uint64, error)

RepeatedFixed64 will append the repeated value(s) to the buffer. This method supports packed or unpacked encoding.

func (*Message) RepeatedFloat

func (m *Message) RepeatedFloat(buf []float32) ([]float32, error)

RepeatedFloat will append the repeated value(s) to the buffer. This method supports packed or unpacked encoding.

func (*Message) RepeatedInt32

func (m *Message) RepeatedInt32(buf []int32) ([]int32, error)

RepeatedInt32 will append the repeated value(s) to the buffer. This method supports packed or unpacked encoding.

func (*Message) RepeatedInt64

func (m *Message) RepeatedInt64(buf []int64) ([]int64, error)

RepeatedInt64 will append the repeated value(s) to the buffer. This method supports packed or unpacked encoding.

func (*Message) RepeatedSfixed32

func (m *Message) RepeatedSfixed32(buf []int32) ([]int32, error)

RepeatedSfixed32 will append the repeated value(s) to the buffer. This method supports packed or unpacked encoding.

func (*Message) RepeatedSfixed64

func (m *Message) RepeatedSfixed64(buf []int64) ([]int64, error)

RepeatedSfixed64 will append the repeated value(s) to the buffer. This method supports packed or unpacked encoding.

func (*Message) RepeatedSint32

func (m *Message) RepeatedSint32(buf []int32) ([]int32, error)

RepeatedSint32 will append the repeated value(s) to the buffer. This method supports packed or unpacked encoding.

func (*Message) RepeatedSint64

func (m *Message) RepeatedSint64(buf []int64) ([]int64, error)

RepeatedSint64 will append the repeated value(s) to the buffer. This method supports packed or unpacked encoding.

func (*Message) RepeatedUint32

func (m *Message) RepeatedUint32(buf []uint32) ([]uint32, error)

RepeatedUint32 will append the repeated value(s) to the buffer. This method supports packed or unpacked encoding.

func (*Message) RepeatedUint64

func (m *Message) RepeatedUint64(buf []uint64) ([]uint64, error)

RepeatedUint64 will append the repeated value(s) to the buffer. This method supports packed or unpacked encoding.

func (*Message) Reset

func (m *Message) Reset(newData []byte)

Reset will set the index to 0 so the message can be read again. Optionally pass in new data to reuse the Message object.

func (*Message) Sfixed32

func (b *Message) Sfixed32() (int32, error)

Sfixed32 reads a fixed 4 byte value signed value.

func (*Message) Sfixed64

func (b *Message) Sfixed64() (int64, error)

Sfixed64 reads a fixed 8 byte signed value.

func (*Message) Sint32

func (b *Message) Sint32() (int32, error)

Sint32 uses variable-length encoding with zig-zag encoding for signed values. This field type more efficiently encodes negative numbers than regular int32s.

func (*Message) Sint64

func (b *Message) Sint64() (int64, error)

Sint64 uses variable-length encoding with zig-zag encoding for signed values. This field type more efficiently encodes negative numbers than regular int64s.

func (*Message) Skip

func (m *Message) Skip()

Skip will move the scanner past the current value if it is not needed. If a value is not parsed this method must be called to move the decoder past the value.

func (*Message) String

func (m *Message) String() (string, error)

String reads a string type. This data will always contain UTF-8 encoded or 7-bit ASCII text.

func (*Message) Uint32

func (b *Message) Uint32() (uint32, error)

Uint32 reads a variable-length encoding of up to 4 bytes.

func (*Message) Uint64

func (b *Message) Uint64() (uint64, error)

Uint64 reads a variable-length encoding of up to 8 bytes.

func (*Message) Varint32

func (b *Message) Varint32() (uint32, error)

Varint32 reads up to 32-bits of variable-length encoded data. Note that negative int32 values could still be encoded as 64-bit varints due to their leading 1s.

func (*Message) Varint64

func (b *Message) Varint64() (uint64, error)

Varint64 reads up to 64-bits of variable-length encoded data.

func (*Message) WireType

func (m *Message) WireType() int

WireType returns the 'type' of the data at the current location.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL