Documentation ¶
Overview ¶
Package ogórek(*) is a library for decoding/encoding Python's pickle format.
Use Decoder to decode a pickle from input stream, for example:
d := ogórek.NewDecoder(r) obj, err := d.Decode() // obj is interface{} representing decoded Python object
Use Encoder to encode an object as pickle into output stream, for example:
e := ogórek.NewEncoder(w) err := e.Encode(obj)
The following table summarizes mapping of basic types in between Python and Go:
Python Go ------ -- None ↔ ogórek.None bool ↔ bool int ↔ int64 int ← int, intX, uintX long ↔ *big.Int float ↔ float64 float ← floatX list ↔ []interface{} tuple ↔ ogórek.Tuple dict ↔ map[interface{}]interface{} str ↔ string (+) bytes ↔ ogórek.Bytes (~) bytearray ↔ []byte
Python classes and instances are mapped to Class and Call, for example:
Python Go ------ -- decimal.Decimal ↔ ogórek.Class{"decimal", "Decimal"} decimal.Decimal("3.14") ↔ ogórek.Call{ ogórek.Class{"decimal", "Decimal"}, ogórek.Tuple{"3.14"}, }
In particular on Go side it is thus by default safe to decode pickles from untrusted sources(^).
Pickle protocol versions ¶
Over the time the pickle stream format was evolving. The original protocol version 0 is human-readable with versions 1 and 2 extending the protocol in backward-compatible way with binary encodings for efficiency. Protocol version 2 is the highest protocol version that is understood by standard pickle module of Python2. Protocol version 3 added ways to represent Python bytes objects from Python3(~). Protocol version 4 further enhances on version 3 and completely switches to binary-only encoding. Protocol version 5 added support for out-of-band data(%). Please see https://docs.python.org/3/library/pickle.html#data-stream-format for details.
On decoding ogórek detects which protocol is being used and automatically handles all necessary details.
On encoding, for compatibility with Python2, by default ogórek produces pickles with protocol 2. Bytes thus, by default, will be unpickled as str on Python2 and as bytes on Python3. If an earlier protocol is desired, or on the other hand, if Bytes needs to be encoded efficiently (protocol 2 encoding for bytes is far from optimal), and compatibility with pure Python2 is not an issue, the protocol to use for encoding could be explicitly specified, for example:
e := ogórek.NewEncoderWithConfig(w, &ogórek.EncoderConfig{ Protocol: 3, }) err := e.Encode(obj)
See EncoderConfig.Protocol for details.
Persistent references ¶
Pickle was originally created for serialization in ZODB (http://zodb.org) object database, where on-disk objects can reference each other similarly to how one in-RAM object can have a reference to another in-RAM object.
When a pickle with such persistent reference is decoded, ogórek represents the reference with Ref placeholder similarly to Class and Call. However it is possible to hook into decoding and process such references in application specific way, for example loading the referenced object from the database:
d := ogórek.NewDecoderWithConfig(r, &ogórek.DecoderConfig{ PersistentLoad: ... }) obj, err := d.Decode()
Similarly, for encoding, an application can hook into serialization process and turn pointers to some in-RAM objects into persistent references.
Please see DecoderConfig.PersistentLoad and EncoderConfig.PersistentRef for details.
--------
(*) ogórek is Polish for "pickle".
(+) for Python2 both str and unicode are decoded into string with Python str being considered as UTF-8 encoded. Correspondingly for protocol ≤ 2 Go string is encoded as UTF-8 encoded Python str, and for protocol ≥ 3 as unicode.
(~) bytes can be produced only by Python3 or zodbpickle (https://pypi.org/project/zodbpickle), not by standard Python2. Respectively, for protocol ≤ 2, what ogórek produces is unpickled as bytes by Python3 or zodbpickle, and as str by Python2.
(^) contrary to Python implementation, where malicious pickle can cause the decoder to run arbitrary code, including e.g. os.system("rm -rf /").
(%) ogórek currently does not support out-of-band data.
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ErrInvalidPickleVersion = errors.New("invalid pickle version")
Functions ¶
This section is empty.
Types ¶
type Decoder ¶
type Decoder struct {
// contains filtered or unexported fields
}
Decoder is a decoder for pickle streams.
func NewDecoder ¶
NewDecoder constructs a new Decoder which will decode the pickle stream in r.
func NewDecoderWithConfig ¶ added in v1.1.0
func NewDecoderWithConfig(r io.Reader, config *DecoderConfig) *Decoder
NewDecoderWithConfig is similar to NewDecoder, but allows specifying decoder configuration.
type DecoderConfig ¶ added in v1.1.0
type DecoderConfig struct { // PersistentLoad, if !nil, will be used by decoder to handle persistent references. // // Whenever the decoder finds an object reference in the pickle stream // it will call PersistentLoad. If PersistentLoad returns !nil object // without error, the decoder will use that object instead of Ref in // the resulted built Go object. // // An example use-case for PersistentLoad is to transform persistent // references in a ZODB database of form (type, oid) tuple, into // equivalent-to-type Go ghost object, e.g. equivalent to zodb.BTree. // // See Ref documentation for more details. PersistentLoad func(ref Ref) (interface{}, error) }
DecoderConfig allows to tune Decoder.
type Encoder ¶
type Encoder struct {
// contains filtered or unexported fields
}
An Encoder encodes Go data structures into pickle byte stream
func NewEncoder ¶
NewEncoder returns a new Encoder struct with default values
func NewEncoderWithConfig ¶ added in v1.1.0
func NewEncoderWithConfig(w io.Writer, config *EncoderConfig) *Encoder
NewEncoderWithConfig is similar to NewEncoder, but allows specifying the encoder configuration.
type EncoderConfig ¶ added in v1.1.0
type EncoderConfig struct { // Protocol specifies which pickle protocol version should be used. Protocol int // PersistentRef, if !nil, will be used by encoder to encode objects as persistent references. // // Whenever the encoders sees pointer to a Go struct object, it will call // PersistentRef to find out how to encode that object. If PersistentRef // returns nil, the object is encoded regularly. If !nil - the object // will be encoded as an object reference. // // See Ref documentation for more details. PersistentRef func(obj interface{}) *Ref }
EncoderConfig allows to tune Encoder.
type OpcodeError ¶
OpcodeError is the error that Decode returns when it sees unknown pickle opcode.
func (OpcodeError) Error ¶
func (e OpcodeError) Error() string
type Ref ¶
type Ref struct { // persistent ID of referenced object. // // used to be string for protocol 0, but "upgraded" to be arbitrary // object for later protocols. Pid interface{} }
Ref is the default representation for a Python persistent reference.
Such references are used when one pickle somehow references another pickle in e.g. a database.
See https://docs.python.org/3/library/pickle.html#pickle-persistent for details.
See DecoderConfig.PersistentLoad and EncoderConfig.PersistentRef for ways to tune Decoder and Encoder to handle persistent references with user-specified application logic.