collection

package module
v0.0.0-...-8983f9b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 30, 2020 License: Apache-2.0 Imports: 0 Imported by: 0

README

= Collections in Go
Matt Nicholls <transientvariable@gmail.com>
:keywords: Golang, Abstract Data Types, Data Structures
:sectanchors: true
:source-highlighter: prettify
:icons: font

ifdef::env-github[]
:important-caption: :heavy_exclamation_mark:
:caution-caption: :fire:
:warning-caption: :warning:
endif::[]

CAUTION: This is still very much a WIP and is not currently recommended for use in production.

In my experience thus far with https://golang.org/[Golang] there has been a prevailing idiom of "the standard library has everything you need" a.k.a. batteries included, which has held up in most use cases. However, collections in Go lack a sizable chunk of functionality when compared to other popular (or _beloved_ 🦀) programming languages currently in use (refer to your favorite top language list). I suspect this is due large part by https://blog.golang.org/why-generics[Go's lack of generics], but that horse has already been savagely beaten. This is not to say that Go's standard collections are completely useless, there are some https://github.com/golang/go/wiki/SliceTricks[pretty nifty slice tricks] that I use on a regular basis. So instead of engaging in some long diatribe on a blog or forum, I've tried to address some of the gaps in Go's collection story here.

== Usage

=== Prerequisites

- The link:https://git-scm.com/[Git] version management tool
- The link:https://golang.org/dl/[Golang Runtime], version 1.13 or later

=== Fetch the Source

....
$ go get -u github.com/2speed/go-collection
....

=== Example: Functional-style Collection Operations

When consuming Windows event log data from a streaming data platform, a common task might be parsing and normalizing file hashes from each event to better serve downstream processes. The raw format of this data might be in JSON, and look something like this:

.raw JSON event data
[source,json]
----
{
  "AccountName": "SYSTEM",
  "AccountType": "User",
  "DestinationHostname": "bad.host.com",
  "DestinationIp": "...",
  "DestinationIsIpv6": "false",
  "DestinationPort": "53",
  "EventReceivedTime": "...",
  "EventTime": "...",
  "EventType": "INFO",
  "Hashes": "md5=37a6259cc0c1dae299a7866489dff0bd,sha1=2be88ca4242c76e8253ac62474851065032d6833,sha256=74234e98afe7498fb5daf1f36ac2d78acc339464f950703b8c019892f982b90b",
  "Hostname": "badactor",
  "Image": "...",
  "Opcode": "Info",
  "Protocol": "udp",
  "SourceHostname": "badactor.inside.your.network",
  "SourceIp": "...",
  "SourceIsIpv6": "false",
  "SourceModuleName": "eventlog",
  "SourceName": "Microsoft-Windows-Sysmon"
}
----

Everything looks okay except that pesky `Hashes` field. What we really want is for the `Hashes` field to be a JSON object and not a string:

[source,json]
----
{

  "Hashes": {
    "md5": "37a6259cc0c1dae299a7866489dff0bd",
    "sha1": "2be88ca4242c76e8253ac62474851065032d6833",
    "sha256": "74234e98afe7498fb5daf1f36ac2d78acc339464f950703b8c019892f982b90b"
  }

}
----

Depending on the programming language at your disposal, parsing the data into the required format is pretty straightforward (almost) using functional-style collection operations provided the language's standard library:

.Python
[source,python]
----
hashes = "md5=37a6259cc0c1dae299a7866489dff0bd,sha1=2be88ca4242c76e8253ac62474851065032d6833,sha256=74234e98afe7498fb5daf1f36ac2d78acc339464f950703b8c019892f982b90b"
file_hashes = {p[0]: p[1] for p in [s.split('=') for s in hashes.split(',')] if len(p) == 2}
----

.JS
[source,javascript]
----
const hashes = "md5=37a6259cc0c1dae299a7866489dff0bd,sha1=2be88ca4242c76e8253ac62474851065032d6833,sha256=74234e98afe7498fb5daf1f36ac2d78acc339464f950703b8c019892f982b90b";
const fileHashes =
  hashes.split(',')
    .map(h => h.split('='))
    .filter(h => h.length === 2)
    .reduce((fh, h) => { fh[h[0]] = h[1]; return fh; }, {});
----

.Java
[source,java]
----
import static java.util.stream.Arrays.stream;
import static java.util.stream.Collectors.toMap;

...

final var hashes = "md5=37a6259cc0c1dae299a7866489dff0bd,sha1=2be88ca4242c76e8253ac62474851065032d6833,sha256=74234e98afe7498fb5daf1f36ac2d78acc339464f950703b8c019892f982b90b";
final var fileHashes =
        stream(hashes.split(","))
            .map(h -> h.split("="))
            .filter(h -> h.length == 2)
            .collect(toMap(h -> h[0], h -> h[1]));

...
----

.Rust
[source,rust]
----
use std::collections::HashMap;

...

let hashes = "md5=37a6259cc0c1dae299a7866489dff0bd,sha1=2be88ca4242c76e8253ac62474851065032d6833,sha256=74234e98afe7498fb5daf1f36ac2d78acc339464f950703b8c019892f982b90b";
let file_hashes =
    hashes.split(',')
        .map(|h| h.split('=').collect::<Vec<_>>())
        .filter(|h| h.len() == 2)
        .map(|h| (h[0], h[1]))
        .collect::<HashMap<_, _>>();

...
----

.Golang
[source,golang]
----
package main

import (
    "strings"
)

func MapPairs(initial []string, f func(string) []string) [][]string {
    mapped := make([][]string, len(initial))
    for i, v := range initial {
        mapped[i] = f(v)
    }
    return mapped
}

func FilterPairs(pairs [][]string, f func([]string) bool) [][]string {
    filtered := make([][]string, 0)
    for _, v := range pairs {
        if f(v) {
            filtered = append(filtered, v)
        }
    }
    return filtered
}

func ReducePairs(pairs [][]string, f func(interface{}, []string), initial interface{}) interface{} {
    for _, v := range pairs {
        f(initial, v)
    }
    return initial
}

func main() {
    hashes := "md5=37a6259cc0c1dae299a7866489dff0bd,sha1=2be88ca4242c76e8253ac62474851065032d6833,sha256=74234e98afe7498fb5daf1f36ac2d78acc339464f950703b8c019892f982b90b"

    hashPairs :=
        MapPairs(strings.Split(hashes, ","), func(v string) []string {
            return strings.Split(v, "=")
        })

    filteredPairs :=
        FilterPairs(hashPairs, func(p []string) bool {
            return len(p) == 2
        })

    fileHashes :=
        ReducePairs(filteredPairs, func(acc interface{}, p []string) {
            acc.(map[string]string)[p[0]] = p[1]
        }, make(map[string]string))
}
----

Hmmm, it would seem that Go implementation is a bit verbose. Let's see if we can shorten that up using `ArrayList`:

.Golang (revised using `ArrayList`)
[source,golang]
----
package main

import (
    "strings"

    "github.com/2speed/go-collection/list"
)

func main() {
    hashes     := "md5=37a6259cc0c1dae299a7866489dff0bd,sha1=2be88ca4242c76e8253ac62474851065032d6833,sha256=74234e98afe7498fb5daf1f36ac2d78acc339464f950703b8c019892f982b90b"
    fileHashes := make(map[string]interface{})

    list.NewArrayListOf(strings.Split(hashes, ",")).
        Map(func(e interface{}) interface{} { // split each element into hash and value pairs
            return strings.Split(e.(string), "=")
        }).
        Filter(func(e interface{}) bool {     // ignore invalid hash and value pairs
            return len(e.([]string)) == 2
        }).
        ForEach(func(e interface{}) {         // add each hash and value pair to the map of file hashes
            p := e.([]string)
            fileHashes[p[0]] = p[1]
        })
}
----

...variant with better readability:

[source,golang]
----
package main

import (
    "strings"

    "github.com/2speed/go-collection/list"
)

func main() {
    hashes     := "md5=37a6259cc0c1dae299a7866489dff0bd,sha1=2be88ca4242c76e8253ac62474851065032d6833,sha256=74234e98afe7498fb5daf1f36ac2d78acc339464f950703b8c019892f982b90b"
    fileHashes := make(map[string]interface{})

    toPair := func(e interface{}) interface{} {
        return strings.Split(e.(string), "=")
    }

    byLength := func(e interface{}) bool {
        return len(e.([]string)) == 2
    }

    collectToMap := func(e interface{}) {
        p := e.([]string)
        fileHashes[p[0]] = p[1]
    }

    list.NewArrayListOf(strings.Split(hashes, ",")).
        Map(toPair).           // split each element into hash and value pairs
        Filter(byLength).      // ignore invalid hash and value pairs
        ForEach(collectToMap)  // add each hash and value pair to the map of file hashes
}
----

Yes, there is quite a bit of type assertion boilerplate here due to lack of generics. I'll leave the investigation of alternative techniques like code generation as an exercise for the reader.

Documentation

Index

Constants

View Source
const ElementNotFound = -1
View Source
const (
	ErrorElementNotFound = CollectionError("the requested element could not be found")
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Collection

type Collection interface {

	// Add inserts the provided element into the Collection. The returned error will be non-nil for bounded Collection
	// implementations that have reached capacity and cannot hold any further elements.
	Add(element interface{}) error

	// AddAll inserts all elements from the provided collection into the Collection. The returned error will be non-nil
	// for bounded Collection implementations that have reached capacity and cannot hold any further elements.
	AddAll(collection Collection) error

	// Remove removes the first occurrence (if any) of an element equivalent to the provided element. If an element was
	// removed, the return value will be true, otherwise false will be returned.
	Remove(element interface{}) bool

	// Size returns the number of elements in the Collection.
	Size() int

	// IsEmpty returns true if the Collection contains no elements, otherwise false is returned.
	IsEmpty() bool

	// Clear removes all elements from the Collection.
	Clear()

	// Contains returns true if an element equivalent to the provided element exists in the Collection, otherwise false
	// is returned.
	Contains(value interface{}) bool

	// Values returns a slice containing the elements in the Collection in the iteration order.
	Values() []interface{}
}

Collection defines the behavior for maintaining a collection of elements.

type CollectionError

type CollectionError string

func (CollectionError) Error

func (e CollectionError) Error() string

type Ordered

type Ordered interface {
	Collection

	// Min returns the element with the lowest position in the Collection. More specifically, the first element in the
	// iteration order is returned.
	Min() interface{}

	// Max returns the element with the highest position in the Collection. More specifically, the last element in the
	// iteration order is returned.
	Max() interface{}

	// Predecessor returns the element (if any) from the Collection that is less than the provided element. More
	// specifically, the element before the first occurrence of the provided element in iteration order is returned.
	Predecessor(element interface{}) interface{}

	// Successor returns the element (if any) from the Collection that is greater than the provided element. More
	// specifically, the element after the first occurrence of the provided element in iteration order is returned.
	Successor(element interface{}) interface{}
}

Ordered defines the behavior for a Collection whose elements are algorithmically positioned.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL