ais

package module

v1.0.0 Latest Latest Go to latest Published: Jan 2, 2019 License: MIT Imports: 17 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/FATHOM5/ais

Links

Open Source Insights

README ¶

Jump straight to Usage

Package AIS - Beta Release

Note: This repo is actively maintained with focus on increasing code test coverage to 100%. Until then it is appropriate for use in research, but should not be used for navigation or other mission critical applications.

In September 2018 the United States Navy hosted the annual HACKtheMACHINE Navy Digital Experience in Seattle, Washington. The three-day prototyping, public engagment and educational experience is designed to generated insights into maritime cybersecurity, data science, and rapid prototyping. Track 2, the data science track, focused on collision avoidance between ships.

The U.S. Navy is the largest international operator of unmanned and autonomous systems sailing on and under the world's oceans. Developing algorithms that contribute to safe navigation by autonomous and manned vessels is in the best interest of the Navy and the public. To support the development of such AI-driven navigational systems the Navy sponsored HACKtheMACHINE Seattle Track 2 to create collision avoidance training data from publicly available maritime shipping data. Read the full challenge description here.

This repository is a Go language package, open-source release, of a software toolkit built from insights gained during HACKtheMACHINE Seattle. Over the course of the multi-day challenge teams tried several approaches using a variety of software languages and geospatial information systems (GIS). Ultimately, the complexity of the challenge prevented any single team from providing a complete solution, but all of the winners (see acknowledgements below) provided some useful ideas that are captured in this package. The decision by the Navy to open source release the prototype data science tools built on the ideas generated at HACKtheMACHINE is meant to continue building a vibrant community of practice for maritime data science hobbyists and professionals. Please use this code, submit issues to improve it, and join in. Our community is organized here and on LinkedIn. Please reach out with questions about the code, suggestions to make the usage documentation better, maritime data science in general, or just to ask a few questions about the community.

What's in the package?

Package FATHOM5/ais contains tools for creating machine learning datasets for navigation systems based on open data released by the U.S. Government.

The largest and most comprehensive public data source for maritime domain awareness is the Automatic Identification System (AIS) data collected and released to the public by the U.S. Government on the marinecadastre.gov website. These comma separated value (csv) data files average more than 25,000,000 records per file, and a single month of data is a set of 20 files totalling over 60Gb of information. Therefore, the first hurdle to building a machine learning dataset from these files is a big-data challenge to find interesting interactions in this large corpus of records.

The ais package contains tools for abstracting the process of opening, reading and manipulating these large files and additional tools that support algorithm development to identify interesting interactions. The primary goal of ais is to provide high performance abstractions for dealing with large AIS datasets. In this Beta release, high performance means processing a full day of data for identifying potential two-ship interactions in about 17 seconds. We know this can be improved upon and are eager to get the Beta into use within the community to make it better. Note that 17s performance is inspired by ideas from HACKtheMACHINE but far exceeds any approach demonstrated at the competition by several orders of magnitude.

Installation

Package FATHOM5/ais is a standard Go language library installed in the typical fashion.

go get github.com/FATHOM5/ais

Include the package in your code with

include "github.com/FATHOM5/ais"

Usage

The package contains many facilities for abstracting the use of large AIS csv files, creating subsets of those files, sorting large AIS datasets, appending data to records, and implementing time convolution algorithms. This usage guide introduces many of these ideas with more detailed guidelines available in the godocs.

Basic Operations on RecordSets
Basic Operations on Records
Subsets
Sorting
Appending Fields to Records
Convolution Algorithms

Basic Operations on RecordSets

The first requirement of the package is to reduce the complexities of working with large CSV files and allow algorithm developers to focus on the type RecordSet which is an abstraction to the on-disk CSV files.

OpenRecordSet(filename string) (*RecordSet, error)
func (rs *RecordSet) Save(filename string) error

The facilities OpenRecordSet and Save allow users to open a CSV file downloaded from marinecadastre.gov into a RecordSet, and to save a RecordSet to disk after completing other operations. Since the RecordSet often manages a *os.File object that requires closing, it is a best practice to call defer rs.Close() right after opening a RecordSet.

The typical workflow is to open a RecordSet from disk, analyze the data using other tools in the package, then save the modified set back to disk.

rs, err := ais.OpenRecordSet("data.csv")
defer rs.Close()
if err != nil {
    panic(err)
}

// analyze the recordset...

err = rs.Save("result.csv")
if err != nil {
    panic(err)
}

To create an empty RecordSet the package provides

NewRecordSet() *RecordSet

The *RecordSet returned from this function maintains its data in memory until the Save function is called to write the set to disk.

A RecordSet is comprised of two parts. First, there is a Headers object derived from the first row of CSV data in the file that was opened. The set of Headers can be associated with a JSON Dictionary that provides Definitions for all of the data fields. For any production use the data Dictionary should be considered a mandatory addition to the project, but is often omitted in early data analysis work. The Dictionary should be a JSON file with multiple ais.Definition objects serialized into the file. Loading and assigning the dictionary is demonstrated in this code snippet.

// Most error handling omitted for brevity, but should definitely be 
// included in package use.
rs, _ := ais.OpenRecordSet("data.csv")
defer rs.Close()
j, _ = os.Open("dataDictionary.json")
defer j.Close()
jsonBlob, _ := ioutil.ReadAll(j)
if err := rs.SetDictionary(jsonBlob); err != nil {
	panic(err)
}

h := rs.Headers()
fmt.Println(h)

The final call to fmt.Println(h) will call the Stringer interface for Headers and pretty print the index, header name, and definition for all of the column names contained in the underlying csv file that rs now accesses.

Second, in addition to Headers the RecordSet contains an unexported data store of the AIS reports in the set. Each line of data in the underlying CSV files is a single Record that can be accessed through calls to the Read() method. Each call to Read() advances the file pointer in the underlying CSV file until reaching io.EOF. The idiomatic way to process through each Record in the RecordSet is

// Some error handling omitted for brevity, but should definitely be 
// included in package use.
rs, err := ais.OpenRecordSet("data.csv")
defer rs.Close()
if err != nil {
    panic(err)
}

var rec *ais.Record
for {
	rec, err := rs.Read()
	if err == io.EOF {
		break
	}
	if err != nil {
		panic(err)
	}

    // Do something with rec
}

A RecordSet also supports Write(rec Record) error calls. This allows users to create new RecordSet objects. As previously stated, high performance is an important goal of the package and therefore slow IO operations to disk are minimized through buffering. So after completing a series of Write(...) operations package users must call Flush() to flush out any remaining contents of the buffer.

rs := ais.NewRecordSet()
defer rs.Close()

h := strings.Split("MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselName,IMO,CallSign,VesselType,Status,Length,Width,Draft,Cargo", ",")
data := strings.Split("477307900,2017-12-01T00:00:03,36.90512,-76.32652,0.0,131.0,352.0,FIRST,IMO9739666,VRPJ6,1004,moored,337,,,", ",")

rs.SetHeaders(ais.NewHeaders(h, nil))  // note dictionary is not assigned

rec1 := ais.Record(data)

rs.Write(rec1)
err = rs.Flush()
if err != nil {
    panic(err)
}

rs.Save("test.csv")

In many of the examples that follow, error handling is omitted for brevity. However, in use error handling should never be omitted since IO operations and large data set manipulation are error prone activities.

One example of an algorithm against a complete RecordSet is finding all of the unique vessels in a file. This particular algorithm is provided as a method on a RecordSet and returns the type ais.VesselSet.

rs, _ := ais.OpenRecordSet("data.csv")
defer rs.Close()

var vessels ais.VesselSet
vessels, _ = rs.UniqueVessels()

From this point, you can query the vessels map to determine if a particular vessel is present in the RecordSet or count the number of unique vessls in the set with len(vessels).

Basic Operations on Records

Most data science tasks for an AIS RecordSet deal with comparisons on individual lines of data. Package ais abstracts individual lines as Record objects. In order to make comparisons between data fields in a Record it is sometimes necessary to convert the string representation of the data in the underlying csv file into an int, float or time type. The package provides utility functions for this purpose.

func (r Record) ParseFloat(index int) (float64, error)
func (r Record) ParseInt(index int) (int64, error)
func (r Record) ParseTime(index int) (time.Time, error)

The index argument for the functions is the index of the header value that you are trying to parse. The idiomatic way to use these functions is

h := strings.Split("MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselName,IMO,CallSign,VesselType,Status,Length,Width,Draft,Cargo", ",")
data := strings.Split("477307900,2017-12-01T00:00:03,36.90512,-76.32652,0.0,131.0,352.0,FIRST,IMO9739666,VRPJ6,1004,moored,337,,,", ",")

headers := ais.NewHeaders(h, nil)
rec := ais.Record(data)

timeIndex, _ := headers.Contains("BaseDateTime")

var t time.Time
t, err := rec.ParseTime(timeIndex)
if err != nil {
	panic(err)
}
fmt.Printf("The record timestamp is at %s\n", t.Format(ais.TimeLayout))

Another common operation is to measure the distance between two Record reports. The package provides a Record method to compute this directly.

func (r Record) Distance(r2 Record, latIndex, lonIndex int) (nm float64, err error)

The calculated distance is computed using the haversine formula implemented in FATHOM5/haversine. For users unfamiliar with computing great circle distance see this package for an explanation of great circles and the haversine formula.

h := strings.Split("MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselName,IMO,CallSign,VesselType,Status,Length,Width,Draft,Cargo", ",")
headers := ais.NewHeaders(h, nil)
latIndex, _ := headers.Contains("LAT") // !ok checking omitted for brevity
lonIndex, _ := headers.Contains("LON")

data1 := strings.Split("477307900,2017-12-01T00:00:03,36.90512,-76.32652,0.0,131.0,352.0,FIRST,IMO9739666,VRPJ6,1004,moored,337,,,", ",")
data2 := strings.Split("477307902,2017-12-01T00:00:03,36.91512,-76.22652,2.3,311.0,182.0,SECOND,IMO9739800,XHYSF,,underway using engines,337,,,", ",")
rec1 := ais.Record(data1)
rec2 := ais.Record(data2)

nm, err := rec1.Distance(rec2, latIndex, lonIndex)
if err != nil {
    panic(err)
}
fmt.Printf("The ships are %.1fnm away from one another.\n", nm)

This example and the one above it create Record objects directly instead of reading them from a RecordSet.Open call like previous examples. This usage can also come into play when writing data to a new Recordset. For example, in the previous snippet, the variable rec1 could be written to a dataset like this:

// Record and Headers created per the previous example
rs := NewRecordSet
rs.SetHeaders(headers)

_ := rs.Write(rec1) // error checking omited for brevity
_ := rs.Flush()
_ := rs.Save("newData.csv")

Many more uses ways of dealing with RecordSet and Record objects follow in the more advanced uses of the package in the next few sections.

Subsets

The most common operation on multi-gigabyte files downloaded from marinecadastre.gov is to create subsets of about one million records. The original datafiles are a one-month set covering a single UTM zone. The natural subset is to break this into single-day files and then perform analysis on these one-day subsets. To accomplish this operation the package provides the interface Matching and two functions provided by RecordSet that take arguments implementing the Matching interface in order to return a subset.

type Matching interface {
    Match(*Record) (bool, error)
}

func (rs *RecordSet) SubsetLimit(m Matching, n int) (*RecordSet, error)
func (rs *RecordSet) Subset(m Matching) (*RecordSet, error)

Package clients define a type that implements the Matching interface and then pass this type as an argument to Subset or SubsetLimit. The returned *RecordSet contains only those lines from the original RecordSet that return true from the Match function of m.

type subsetOneDay struct {
	rs        *ais.RecordSet
	d1        time.Time // date we want to match
	timeIndex int       //index value of BaseDateTime in the record
}

func (sod *subsetOneDay) Match(rec *ais.Record) (bool, error) {
	d2, err := time.Parse(ais.TimeLayout, (*rec)[sod.timeIndex])
	if err != nil {
		return false, fmt.Errorf("subsetOneDay: %v", err)
	}
	d2 = d2.Truncate(24 * time.Hour)
	return sod.d1.Equal(d2), nil
}

func main(){
	rs, _ := ais.OpenRecordSet("largeData.csv")
	defer rs.Close()

	// Implement a concreate type of subsetOneDay to return records
	// from 25 Dec 2017.
	timeIndex, ok := rs.Headers().Contains("BaseDateTime")
	if !ok {
		panic("recordset does not contain the header BaseDateTime")
	}
	targetDate, _ := time.Parse("2006-01-02", "2017-12-25")
	sod := &subsetOneDay{
		rs:        rs,
		d1:        targetDate,
		timeIndex: timeIndex,
	}

	matches, _ := rs.Subset(sod)
	//matches.Save("newSet.csv")
	subsetRec, _ := matches.Read()
	subsetDate := (*subsetRec)[timeIndex]
	date, _ := time.Parse(ais.TimeLayout, subsetDate)
	fmt.Printf("The first record in the subset has BaseDateTime %v\n", date.Format("2006-01-02"))

	// Output:
	// The first record in the subset has BaseDateTime 2017-12-25
}

This example introduces two additional features of the package. First, the call to rs.Headers().Contains(headerName) is the idiomatic way to get the index value of a header used in a later fucntion call. Always check the ok parameter of this return to ensure the RecordSet includes the necessary Header entry. Second, the package includes the constant TimeLayout = 2006-01-02T15:04:05 which represents the timestamp format in the Marinecadastre files and is designed to be passed to the time.Parse function as the layout string argument.

During algorithm development it is sometimes desirable to create a RecordSet with only a few dozen or a few hundred data lines in order to avoid long computation times between successive iterations of the program. Therefore, the package also provides SubsetLimit(m Matching, n int) where the resulting *RecordSet will only contain the first n matches.

Sorting

The package uses the Go standard library sort capabilities for high performance sorting. The most common operation is to sort a single day of data into chronological order by the BaseDateTime header. This operation is implemented within the package and is exposed to users with a single call to SortByTime().

rs, _ := ais.OpenRecordSet("oneDay.csv")
defer rs.Close()
rs, err := rs.SortByTime()
if err != nil {
    log.Fatalf("unable to sort the recordset: %v", err)
}
rs.Save("oneDaySorted.csv")

In this example, note that the original *Recordset, named rs, created from the OpenRecordSet call is reused to hold the return value from SortByTime. This presents no issues and prevents another memory allocation. The automatic garbage collection in Go (...yeah...automatic garbage collection in a high-performance language) will deal with the pointer reference abandoned by reusing rs.

Package users are encouraged to use the idiomatic sorting method presented above, but sorting is an important operation for AIS data science. So the implementation details are presented here for community discussion to improve the interface to allow more generic sorting. Issue #19 deals with this needed enhancement. The key challenge is that sorting large AIS files presents a big-data issue because a RecordSet is a pointer to an on-disk file or in-memory buffer. In order to sort the data it must be loaded into a []*Record. This requires reading every Record in a set and loading them all into memory...an expensive operation. To accomplish this only when needed the package introduces two new types: ByGeohash and ByTimestamp. In this section we will explain sorting ByTimestamp.

A new ByTimestamp object must read all of the underlying records and load them into a []*Record. This is accomplished in the implementation of NewByTimestamp() by calling the unexported method loadRecords(). Users should not create a ByTimestamp object using the builtin new(Type) command. The example below demonstrates incorrect and correct use of the ByTimestamp type.

 bt := new(ais.ByTimestamp) // Wrong 
 sort.Sort(bt) // Will panic
 
 rs, _ := ais.OpenRecordSet("oneDay.csv")
 defer rs.Close()
 
 bt2, _ := ais.NewByTimestamp(rs)  
 sort.Sort(bt2)
 
// Write the data from the ByTimestamp object into a Recordset
// NOTE: Headers are written only when the RecordSet is saved to disk
rsSorted := ais.NewRecordSet()
defer rsSorted.Close()
rsSorted.SetHeaders(rs.Headers())

for _, rec := range *bt.data {
	rsSorted.Write(rec)
}
err := rsSorted.Flush()
if err != nil {
	log.Fatalf("flush error writing to new recordset: %v", err)
}
rsSorted.Save("oneDaySorted.csv")

The ByTimestamp type implements the Len, Swap and Less methods required by sort.Interface. So bt2 can be passed directly to sort.Sort(bt) in the example. Admittedly, the sort.Interface could be implemented better in pacakge ais and a draft design is suggested in Issue #19 for community comment.

This example also introduces another new syntax use. Note the way the output was created with NewRecordSet() and specifically, the way the Headers of the new set were assigned from the existing set in the line rsSorted.SetHeaders(rs.Headers)).

Appending Fields to Records

Often times a new field for every Record is needed to capture some derived or computed element about the vessel in the Record. This new field can often comes from a cross-source lookup. For example, marinetraffic.com offers a vessel lookup service by MMSI. More commonly new fields can come from computed results derived from data already in the Record. In this example we are adding a geohash to each Record.

Package ais provides the RecordSet method

 func (rs *RecordSet) AppendField(newField string, requiredHeaders []string, gen Generator) (*RecordSet, error)

Arguments to this function are the new field name passed as a string and two additional arguments that bear a little explanation. The second argument, requiredHeaders, is a []string of the header names in the Record that will be used to derive the new Field. In our example we will be passing the "LAT" and "LON" fields so we verify they exist before calling AppendField. The final argument is a type that implements theais.Generator interface.

type Generator interface {
    Generate(rec Record, index ...int) (Field, error)
}

Types that implement the Generator interface will have the Generate method called for every record in the RecordSet. The package provides one implementation of a Generator called Geohasher to append a geohash to every Record. Putting this all together in an example we get

rs, _ := ais.OpenRecordSet("oneDay.csv") // error handling ignored
defer rs.Close()

// Verify that rs contains "LAT" and "LON" Headers
_, ok := rs.Headers().Contains("LAT")
if !ok {
    panic("recordset does not contain 'LAT' header")
}
_, ok = rs.Headers().Contains("LON") // !ok omitted for brevity

// Append the field
requiredHeaders := []string{"LAT", "LON"}
gen := ais.NewGeohasher(rs)
rs, err = rs.AppendField("Geohash", requiredHeaders, gen)
if err != nil {
	panic(err)
}

rs.Save("oneDayGeo.csv")

Convolution Algorithms

The last set of facilities discussed in the usage guidelines are related to creating algorithms that passes a time window over a chronologically sorted RecordSet and apply an analysis or algorithm over the Record data in the Window. From a data science point of view this applies a time convolution to the underlying Record data and can be visualized similar to this gif from the Wikipedia page for convolutions

In package ais the red window from the figure is implemented by the type Window created with a call to

func NewWindow(rs *RecordSet, width time.Duration) (*Window, error)

The Width of the red Window and the rate that it Slides are configurable parameters of a Window. The blue function in the figure represents the Record data that is analyzed as it comes into the Window. Users should call SortByTime() on the RecordSet before applying the convolution so that Window is in fact sliding down in time. The resulting data represented by the black line in the figure is usually written to a new RecordSet and saved when the convolution is complete. One way to configure a window from a RecordSet is used in this snippet.

rs, _ := ais.OpenRecordSet("data.csv")
defer rs.Close()

win, err := NewWindow(rs, 10 * time.Minute)
if err != nil {
    panic(err)
}

The call to NewWindow sets the left marker for the Window equal to the time in the next call to Read on rs, and the Width is set to ten minutes in this example. Once the window is created it is used by successive calls to Slide. The idiomatic way to implement this is

for {
	rec, err := rs.Read()
	if err == io.EOF {
		break
	}
	if err != nil {
		panic(err)
	}	
	
	ok, _ := win.RecordInWindow(rec)
	if ok {
		win.AddRecord(*rec)
	} else {
		rs.Stash(rec)
		
		// Do something with the Records in the Window
		
		win.Slide(windowSlide)
	}
}

The first part of this, the RecordSet traversal should begin to look familiar by this point of the tutorial. This is the idiomatic way to process a RecordSet repeated here for emphasis. The new parts come with the call to RecordInWindow(rec) where the newly read Record is tested to see whether it is in the time window. If ok then the Record is added to the data held by win. The internal data structure for this recordkeeping is a standard Go map, but the key is a fast fnv hash of the Record. This hash returns a uint64 for the key which provides a low probability of hash collision and results in a performant data structure for with approximate O(1) complexity on lookup and insertion.

The next interesting feature of a RecordSet that has not been addressed yet is the call to rs.Stash(rec) if InWindow returns false. This is critical because the most recent call to Read() provided a Record that was not in the window ; however it may be in the Window after a Slide. So this Record must be stashed so that we get to compare it again after the window slides down. The call to rs.Stash puts the record back on the metaphorical shelf and the next loop call to Read will return this same Record for the next comparison.

Finally, after the call to Stash the algorithm has reached a point where all the data that is in the Window has been loaded. When sliding down a RecordSet that is already sorted chronologically finding a Record that is not in the Window means that that all Records within that window of time have already been found. So now we can process the Record data to find whatever relationship the time dependent algorithm is trying to identify.

For example, HACKtheMACHINE Seattle challenged participants to find two-vessel interactions that indicate potential maneuvering situations between ships close to one another in time and space. The Window in this case guarantees that vessels are close to one another in time. By adding a geohash to each record in the file clean_date before running this code then sliding the Window can be implemented to find ships that are within the same geohash box. In the worked example that follows these boxes in time and space are each a Cluster. When there are more than two vessels in a Cluster then an Interaction is the two-vessel pair that is in the Window and share the same geohash.

// Interaction completes the workflow to write a RecordSet that uniquely
// identifies two-ship interaction that occur closely separated in time and
// share a geohash that ensures the vessels are within about 4nm of one another.
package main

import (
	"fmt"
	"io"
	"time"

	"github.com/FATHOM5/ais"
)

// Use a negative number to slide over the full file.  A positive integer will
// break out of the iteration loop after the specified number of slides.
const maxSlides = -1

const filename = `clean_data.csv`
const outFilename = `twoShipInteractions.csv`
const windowWidth time.Duration = 10 * time.Minute
const windowSlide time.Duration = 5 * time.Minute

func main() {
	rs, _ := ais.OpenRecordSet(filename)
	defer rs.Close()

	win, _ := ais.NewWindow(rs, windowWidth)
	fmt.Print(win.Config())

	inter, _ := ais.NewInteractions(rs.Headers())
	geoIndex, _ := rs.Headers().Contains("Geohash")

	for slides := 0; slides != maxSlides; {
		rec, err := rs.Read()
		if err == io.EOF {
			break
		}
		if err != nil {
			panic(err)
		}

		ok, _ := win.RecordInWindow(rec)
		if ok {
			win.AddRecord(*rec)
		} else {
			rs.Stash(rec)

			cm := win.FindClusters(geoIndex)
			for geohash, cluster := range cm {
				if cluster.Size() > 1 {
					_ := inter.AddCluster(cluster)
				}
			}
			win.Slide(windowSlide)
			slides++
		}
	}

	// Save the interactions to a File
	fmt.Println("Saving the interactions to:", outFilename)
	inter.Save(outFilename)
}

This last example provides a full use case of applying many of the facilities in package ais to build a dataset of potential two-ship interactions that can train a navigation system artificial intelligence. For the complete example that includes all REQUIRED error handling, some timing parameters for performance measurement and a few pretty printing additions see the solution posted to the HACKtheMACHINE Track 2 repository. There are a few new methods presented in this example, like win.Config() and win.FindClusters, but they are well-documented in the online package documentation along with other facilites and methods that did not get discussed in the tutorial. Check out the full package documentation at godoc.org for more examples and additional explanations.

More importantly, If you have read to this point you are more than casually interested in maritime data science so give the repo a star, try some of the examples and reach out. You have read now a few thousand lines, so let's hear from you. We are actively growing the community and want you to be a part of it!

Acknowledgements

The solutions presented in this repo were made possible by the idea generation and execution that occured by the contestant teams over weekend. Competitors came from government, academia, and across industries to collaboratively develop solutions to a series of critical and challenging problems. In a single weekend teams:

Developed data quality indicators and tools,
Identified key inconsistencies in the data,
Improved dataset quality,
Created algorithms that worked on small subsets of the data, and
Suggested and prototyped methods for extending the analysis to larger datasets.

Maintenance

FATHOM5 is a proud partner of the U.S. Navy in creating a community of practice for maritime digital security. The code developed in this repo released under MIT License is an important contribution to growing the HACKtheMACHINE community and part of our corporate commitment to creating a new wave of maritime technology innovation. And oh yeah...we are hiring!

Community

AIS will only increase in importance over the next couple of years with improved accuracy and reduced time latency. With the right algorithms real-time tracking and predictive modeling of ships’ behavior and position will be possible for the first-time. With techniques development by the community AIS data will assist in developing safe autonomous ships, help prevent collisions, reduce environmental impacts, and make the waterways safer and more enjoyable for all.

We want to create a vibrant and thriving maritime innovation community around the potential of large AIS datasets. Please consider joining us. Open issues for bugs, provide an experience report for your use of the package, or just give the repo a star because we are not trying to create algorithms that serve the greater good, not just advertisers!

Documentation ¶

Overview ¶

Package ais provides types and methods for conducting data science on signals generated by maritime entities radiating from an Automated Identification System (AIS) transponder as mandated by the International Maritime Organization (IMO) for all vessels over 300 gross tons and all passenger vessels.

Example ¶

This example shows the basic usage of creating a new RecordSet and then using it to write a Record and finally saving the RecordSet to a csv file.

package main

import (
	"strings"

	"github.com/FATHOM5/ais"
)

func main() {
	rs := ais.NewRecordSet()
	defer rs.Close()

	h := ais.Headers{
		Fields: strings.Split("MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselName,IMO,CallSign,VesselType,Status,Length,Width,Draft,Cargo", ","),
	}
	data := strings.Split("477307900,2017-12-01T00:00:03,36.90512,-76.32652,0.0,131.0,352.0,FIRST,IMO9739666,VRPJ6,1004,moored,337,,,", ",")

	rs.SetHeaders(h)

	rec1 := ais.Record(data)
	err := rs.Write(rec1)
	if err != nil {
		panic(err)
	}
	err = rs.Flush()
	if err != nil {
		panic(err)
	}

	err = rs.Save("test.csv")
	if err != nil {
		panic(err)
	}

}

Output:

Example (Distance) ¶

This example demonstrates how to contruct two ais.Record types and compute the haversine distance between them.

package main

import (
	"fmt"
	"strings"

	"github.com/FATHOM5/ais"
)

func main() {

	h := ais.Headers{
		Fields: strings.Split("MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselName,IMO,CallSign,VesselType,Status,Length,Width,Draft,Cargo", ","),
	}
	idxMap, ok := h.ContainsMulti("LAT", "LON")
	if !ok {
		panic("missing one or more required headers LAT and LON")
	}

	data1 := strings.Split("477307900,2017-12-01T00:00:03,36.90512,-76.32652,0.0,131.0,352.0,FIRST,IMO9739666,VRPJ6,1004,moored,337,,,", ",")
	data2 := strings.Split("477307902,2017-12-01T00:00:03,36.91512,-76.22652,2.3,311.0,182.0,SECOND,IMO9739800,XHYSF,,underway using engines,337,,,", ",")
	rec1 := ais.Record(data1)
	rec2 := ais.Record(data2)

	nm, err := rec1.Distance(rec2, idxMap["LAT"].Idx, idxMap["LON"].Idx)
	if err != nil {
		panic(err)
	}
	fmt.Printf("The ships are %.1fnm away from one another.\n", nm)

}

Output:

The ships are 4.8nm away from one another.

Index ¶

Constants
Variables
func PairHash64(rec1, rec2 *Record, indices [4]int) (uint64, error)
type Box
- func (b *Box) Match(rec *Record) (bool, error)
type ByTimestamp
- func NewByTimestamp(rs *RecordSet) (*ByTimestamp, error)
- func (bt ByTimestamp) Len() int
- func (bt ByTimestamp) Less(i, j int) bool
- func (bt ByTimestamp) Swap(i, j int)
type Cluster
- func (c *Cluster) Append(rec *Record)
- func (c *Cluster) Data() []*Record
- func (c *Cluster) Size() int
- func (c *Cluster) String() string
type ClusterMap
type Field
type Generator
type Geohasher
- func NewGeohasher(rs *RecordSet) *Geohasher
- func (g *Geohasher) Generate(rec Record, index ...int) (Field, error)
type HeaderMap
type Headers
- func (h Headers) Contains(field string) (i int, ok bool)
- func (h Headers) ContainsMulti(fields ...string) (idxMap map[string]HeaderMap, ok bool)
- func (h Headers) Equals(h2 Headers) bool
- func (h Headers) String() string
type Interactions
- func NewInteractions(h Headers) (*Interactions, error)
- func (inter *Interactions) AddCluster(c *Cluster) error
- func (inter *Interactions) Len() int
- func (inter *Interactions) Save(filename string) error
type Matching
type Record
- func (r Record) Data() []byte
- func (r Record) Distance(r2 Record, latIndex, lonIndex int) (nm float64, err error)
- func (r Record) Hash() uint64
- func (r Record) ParseFloat(index int) (float64, error)
- func (r Record) ParseInt(index int) (int64, error)
- func (r Record) ParseTime(index int) (time.Time, error)
- func (r *Record) Value(idx int) (val string, ok bool)
- func (r *Record) ValueFrom(hm HeaderMap) (val string, ok bool)
type RecordPair
type RecordSet
- func NewRecordSet() *RecordSet
- func OpenRecordSet(filename string) (*RecordSet, error)
- func (rs *RecordSet) AppendField(newField string, requiredHeaders []string, gen Generator) (*RecordSet, error)
- func (rs *RecordSet) Close() error
- func (rs *RecordSet) Flush() error
- func (rs *RecordSet) Headers() Headers
- func (rs *RecordSet) Read() (*Record, error)
- func (rs *RecordSet) Save(name string) error
- func (rs *RecordSet) SetHeaders(h Headers)
- func (rs *RecordSet) SortByTime() (*RecordSet, error)
- func (rs *RecordSet) Stash(rec *Record)
- func (rs *RecordSet) Subset(m Matching) (*RecordSet, error)
- func (rs *RecordSet) SubsetLimit(m Matching, n int, multipass bool) (*RecordSet, error)
- func (rs *RecordSet) UniqueVessels() (VesselSet, error)
- func (rs *RecordSet) UniqueVesselsMulti(multipass bool) (VesselSet, error)
- func (rs *RecordSet) Write(rec Record) error
type Vessel
type VesselSet
type Window
- func NewWindow(rs *RecordSet, width time.Duration) (*Window, error)
- func (win *Window) AddRecord(rec Record)
- func (win *Window) Config() string
- func (win *Window) FindClusters(geohashIndex int) ClusterMap
- func (win *Window) InWindow(t time.Time) bool
- func (win *Window) Left() time.Time
- func (win *Window) Len() int
- func (win *Window) RecordInWindow(rec *Record) (bool, error)
- func (win *Window) Right() time.Time
- func (win *Window) SetIndex(index int)
- func (win *Window) SetLeft(marker time.Time)
- func (win *Window) SetRight(marker time.Time)
- func (win *Window) SetWidth(dur time.Duration)
- func (win *Window) Slide(dur time.Duration)
- func (win *Window) String() string
- func (win *Window) Width() time.Duration

Constants ¶

View Source

const InteractionFields = "InteractionHash,Distance(nm)," +
	"MMSI_1,BaseDateTime_1,LAT_1,LON_1,SOG_1,COG_1,Heading_1,VesselName_1,IMO_1,CallSign_1,VesselType_1,Status_1,Length_1,Width_1,Draft_1,Cargo_1,Geohash_1," +
	"MMSI_2,BaseDateTime_2,LAT_2,LON_2,SOG_2,COG_2,Heading_2,VesselName_2,IMO_2,CallSign_2,VesselType_2,Status_2,Length_2,Width_2,Draft_2,Cargo_2,Geohash_2"

InteractionFields are the default column headers used to write a csv file of two vessel interactions. The first field InteractionHash is an ParirHash64 return value that uniquely identifies this interaction and Distance(nm) is the haversine distance between the two vessels.

View Source

const TimeLayout = `2006-01-02T15:04:05`

TimeLayout is the timestamp format for the MarineCadastre.gov AIS data available from the U.S. Government. An example timestamp from the data set is `2017-12-05T00:01:14`. This layout is designed to be passed to the time.Parse function as the layout string.

Variables ¶

View Source

var ErrEmptySet = errors.New("ErrEmptySet")

ErrEmptySet is the error returned by Subset variants when there are no records in the returned *RecordSet because nothing matched the selection criteria. Functions should only return ErrEmptySet when all processing occurred successfully, but the subset criteria provided no matches to return.

Functions ¶

func PairHash64 ¶

func PairHash64(rec1, rec2 *Record, indices [4]int) (uint64, error)

PairHash64 returns a 64 bit fnv hash from two AIS records based on the string values of MMSI, BaseDateTime, LAT, and LON for each vessel. Indices must contain the index values in rec1 and rec2 for MMSI, BaseDateTime, LAT and LON.

Types ¶

type Box ¶

type Box struct {
	MinLat, MaxLat, MinLon, MaxLon float64
	LatIndex, LonIndex             int
}

Box provides a type with min and max values for latitude and longitude, and Box implements the Matching interface. This provides a convenient way to create a Box and pass the new object to Subset in order to get a *RecordSet defined with a geographic boundary. Box includes records that are on the border and at the vertices of the geographic boundary. Constructing a box also requires the index value for lattitude and longitude in a *Record. These index values will be called in *Record.ParseFloat(index) from the Match method of a Box in order to see if the Record is in the Box.

func (*Box) Match ¶

func (b *Box) Match(rec *Record) (bool, error)

Match implements the Matching interface for a Box. Errors in the Match function can be caused by parse errors when converting string Record values into their typed values. When Match returns a non-nil error the bool value will be false.

type ByTimestamp ¶

type ByTimestamp struct {
	// contains filtered or unexported fields
}

ByTimestamp implements the sort.Interface for creating a RecordSet sorted by BaseDateTime. The ByTimestamp struct and its Len, Swap, and Less methods are exported in order to serve as examples for how to implement the sort.Interface for a RecordSet. If you want to sort a RecordSet by time you do not need to call these methods. Just call RecordSet.SortByTime() directly to take advantage of the implementation provided in the package.

func NewByTimestamp ¶

func NewByTimestamp(rs *RecordSet) (*ByTimestamp, error)

NewByTimestamp returns a data structure suitable for sorting using the sort.Interface tools.

func (ByTimestamp) Len ¶

func (bt ByTimestamp) Len() int

Len function to implement the sort.Interface.

func (ByTimestamp) Less ¶

func (bt ByTimestamp) Less(i, j int) bool

Less function to implement the sort.Interface.

func (ByTimestamp) Swap ¶

func (bt ByTimestamp) Swap(i, j int)

Swap function to implement the sort.Interface.

type Cluster ¶

type Cluster struct {
	// contains filtered or unexported fields
}

Cluster is an abstraction for a []*Record. The intent is that a Cluster of Records are vessels that share the same geohash

func (*Cluster) Append ¶

func (c *Cluster) Append(rec *Record)

Append adds a *Record to the underlying slice managed by the Cluster

func (*Cluster) Data ¶

func (c *Cluster) Data() []*Record

Data returns the encapsulated data in the Cluster

func (*Cluster) Size ¶

func (c *Cluster) Size() int

Size returns the length of the underlying slice managed by the Cluster.

func (*Cluster) String ¶

func (c *Cluster) String() string

String impelments the stringer interface for Cluster

type ClusterMap ¶

type ClusterMap map[uint64]*Cluster

ClusterMap is an abstraction for a map[geohash]*Cluster.

type Field ¶

type Field string

Field is an abstraction for string values that are read from and written to AIS Records.

type Generator ¶

type Generator interface {
	Generate(rec Record, index ...int) (Field, error)
}

Generator is the interface that is implemented to create a new Field from the index values of existing Fields in a Record. The receiver for Generator should be a pointer in order to avoid creating a copy of the Record when Generate is called millions of times iterating over a large RecordSet. Concrete implementation of the Generator interface are required arguments to RecordSet.AppendField(...).

type Geohasher ¶

type Geohasher RecordSet

Geohasher is the base type for implementing the Generator interface to append a github.com/mccloughlin/geohash to each Record in the RecordSet. Pass NewGeohasher(rs *Recordset) as the gen argument of RecordSet.AppendField to add a geohash to a RecordSet.

func NewGeohasher ¶

func NewGeohasher(rs *RecordSet) *Geohasher

NewGeohasher returns a pointer to a new Geohasher.

func (*Geohasher) Generate ¶

func (g *Geohasher) Generate(rec Record, index ...int) (Field, error)

Generate imlements the Generator interface to create a geohash Field. The returned geohash is accurate to 22 bits of precision which corresponds to about .1 degree differences in lattitude and longitude. The index values for the variadic function on a *Geohasher must be the index of "LAT" and "LON" in the rec. Field will come back nil for any non-nil error returned.

type HeaderMap ¶

type HeaderMap struct {
	Present bool
	Idx     int
}

HeaderMap is the returned map value for ContainsMulti. See the distance example for using the HeaderMap.

type Headers ¶

type Headers struct {
	// Fields is an encapsulated []string . It is initialized from the first
	// non-comment line of an AIS .csv file when ais.OpenRecordSet(filename string)
	// is called.
	Fields []string
}

Headers are the field names for AIS data elements in a Record.

func (Headers) Contains ¶

func (h Headers) Contains(field string) (i int, ok bool)

Contains returns the index of a specific header. This provides a nice syntax ais.Headers().Contains("LAT") to ensure an ais.Record contains a specific field. If the Headers do not contain the requested field ok is false.

func (Headers) ContainsMulti ¶

func (h Headers) ContainsMulti(fields ...string) (idxMap map[string]HeaderMap, ok bool)

ContainsMulti returns a map[string]int where the map keys are the field names and the int values are the index positions of the various fields in the Headers set. If there is an error determining an index position for any field then idxMap returns nil and ok is false. Users should always check for !ok and handle accordingly.

func (Headers) Equals ¶

func (h Headers) Equals(h2 Headers) bool

Equals supports comparison testing of two Headers sets.

func (Headers) String ¶

func (h Headers) String() string

String satisfies the fmt.Stringer interface for Headers. It pretty prints each index value and header, one line per header.

type Interactions ¶

type Interactions struct {
	RecordHeaders Headers // for the Records that will be used to create interactions
	OutputHeaders Headers // for an output RecordSet that may be written from the 2-ship interactions
	// contains filtered or unexported fields
}

Interactions is an abstraction for two-vessel interactions. It requires a set of Headers that correspond to the Record slices being compared and it requires a set of Headers for the output. The default for OutputHeaders is the const InteractionFields with a nil dictionary. The data held by interactions is a map[hash]*RecordPair. This guarantees a non-duplicative set of interactions in the output.

func NewInteractions ¶

func NewInteractions(h Headers) (*Interactions, error)

NewInteractions creates a new set of interactions. It requires a set of Headers from the RecordSet that will be searched for Interactions. These Headers are required to contain "MMSI", "BaseDateTime", "LAT", and "LON" in order to uniquely identify an interaction. The returned *Interactions has its output file Headers set to ais.InteractionHeaders by default.

func (*Interactions) AddCluster ¶

func (inter *Interactions) AddCluster(c *Cluster) error

AddCluster adds all of the interactions in a given cluster to the set of Interactions

func (*Interactions) Len ¶

func (inter *Interactions) Len() int

Len returns the number of Interactions in the set.

func (*Interactions) Save ¶

func (inter *Interactions) Save(filename string) error

Save the interactions to a CSV file.

type Matching ¶

type Matching interface {
	Match(*Record) (bool, error)
}

Matching provides an interface to pass into the Subset and LimitSubset functions of a RecordSet.

type Record ¶

type Record []string

Record wraps the return value from a csv.Reader because many publicly available data sources provide AIS records in large csv files. The Record type and its associate methods allow clients of the package to deal directly with the abtraction of individual AIS records and handle the csv file read/write operations internally.

func (Record) Data ¶

func (r Record) Data() []byte

Data returns the underlying []string in a Record as a []byte

func (Record) Distance ¶

func (r Record) Distance(r2 Record, latIndex, lonIndex int) (nm float64, err error)

Distance calculates the haversine distance between two AIS records that contain a latitude and longitude measurement identified by their index number in the Record slice.

Example ¶

Example demonstrates a simple use of the Distance function.

package main

import (
	"fmt"
	"strings"

	"github.com/FATHOM5/ais"
)

func main() {
	h := strings.Split("MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselName,IMO,CallSign,VesselType,Status,Length,Width,Draft,Cargo", ",")
	headers := ais.Headers{Fields: h}
	latIndex, _ := headers.Contains("LAT")
	lonIndex, _ := headers.Contains("LON")

	data1 := strings.Split("477307900,2017-12-01T00:00:03,36.90512,-76.32652,0.0,131.0,352.0,FIRST,IMO9739666,VRPJ6,1004,moored,337,,,", ",")
	data2 := strings.Split("477307902,2017-12-01T00:00:03,36.91512,-76.22652,2.3,311.0,182.0,SECOND,IMO9739800,XHYSF,,underway using engines,337,,,", ",")
	rec1 := ais.Record(data1)
	rec2 := ais.Record(data2)

	nm, err := rec1.Distance(rec2, latIndex, lonIndex)
	if err != nil {
		panic(err)
	}
	fmt.Printf("The ships are %.1fnm away from one another.\n", nm)

}

Output:

The ships are 4.8nm away from one another.

func (Record) Hash ¶

func (r Record) Hash() uint64

Hash returns a 64 bit hash/fnv of the Record

func (Record) ParseFloat ¶

func (r Record) ParseFloat(index int) (float64, error)

ParseFloat wraps strconv.ParseFloat with a method to return a float64 from the index value of a field in the AIS Record. Useful for getting a LAT, LON, SOG or other numeric value from an ais.Record.

func (Record) ParseInt ¶

func (r Record) ParseInt(index int) (int64, error)

ParseInt wraps strconv.ParseInt with a method to return an Int64 from the index value of a field in the AIS Record. Useful for getting int values from the Records such as MMSI and IMO number.

func (Record) ParseTime ¶

func (r Record) ParseTime(index int) (time.Time, error)

ParseTime wraps time.Parse with a method to return a time.Time from the index value of a field in the AIS Record. Useful for converting the BaseDateTime from the Record. NOTE: FUTURE VERSIONS OF THIS METHOD SHOULD NOT RELY ON A PACKAGE CONSTANT FOR THE LAYOUT FIELD. THIS FIELD SHOULD BE INFERRED FROM A LIST OF FORMATS SEEN IN COMMON DATASOURCES.

Example ¶

package main

import (
	"fmt"
	"strings"

	"github.com/FATHOM5/ais"
)

func main() {
	h := strings.Split("MMSI,BaseDateTime,LAT,LON,SOG,COG,Heading,VesselName,IMO,CallSign,VesselType,Status,Length,Width,Draft,Cargo", ",")
	data := strings.Split("477307900,2017-12-01T00:00:03,36.90512,-76.32652,0.0,131.0,352.0,FIRST,IMO9739666,VRPJ6,1004,moored,337,,,", ",")

	headers := ais.Headers{Fields: h}
	rec := ais.Record(data)

	timeIndex, _ := headers.Contains("BaseDateTime")

	t, err := rec.ParseTime(timeIndex)
	if err != nil {
		panic(err)
	}
	fmt.Printf("The record timestamp is at %s\n", t.Format(ais.TimeLayout))

}

Output:

The record timestamp is at 2017-12-01T00:00:03

func (*Record) Value ¶

func (r *Record) Value(idx int) (val string, ok bool)

Value returns the record value for the []string index. For out out bounds idx arguments or other errors Value returns an empty string for val and false for ok.

func (*Record) ValueFrom ¶

func (r *Record) ValueFrom(hm HeaderMap) (val string, ok bool)

ValueFrom HeaderMap returns the record value decribed in the HeaderMap. The argument is a HeaderMap and normal usage has the nice syntax rec.ValueFrom(idxMap["LAT"]), where idxMap is the returned value from ContainsMulti(...). Returns an empty string and false when hm.Present == false.

type RecordPair ¶

type RecordPair struct {
	// contains filtered or unexported fields
}

RecordPair holds pointers to two Records.

type RecordSet ¶

type RecordSet struct {
	// contains filtered or unexported fields
}

RecordSet is an the high level interface to deal with comma separated value files of AIS records. A RecordSet is not usually constructed from the struct. Use NewRecordSet() to create an empty set, or OpenRecordSet(filename) to read a file on disk.

func NewRecordSet ¶

func NewRecordSet() *RecordSet

NewRecordSet returns a *Recordset that has an in-memory data buffer for the underlying Records that may be written to it. Additionally, the new *Recordset is configured so that the encoding/csv objects it uses internally has LazyQuotes = true and and Comment = '#'.

func OpenRecordSet ¶

func OpenRecordSet(filename string) (*RecordSet, error)

OpenRecordSet takes the filename of an ais data file as its input. It returns a pointer to the RecordSet and a nil error upon successfully validating that the file can be read by an encoding/csv Reader. It returns a nil Recordset on any non-nil error.

func (*RecordSet) AppendField ¶

func (rs *RecordSet) AppendField(newField string, requiredHeaders []string, gen Generator) (*RecordSet, error)

AppendField calls the Generator on each Record in the RecordSet and adds the resulting Field to each record under the newField provided as the argument. The requiredHeaders argument is a []string of the required Headers that must be present in the RecordSet in order for Generator to be successful. If no errors are encournterd it returns a pointer to a new *RecordSet and a nil value for error. If there is an error it will return a nil value for the *RecordSet and an error.

func (*RecordSet) Close ¶

func (rs *RecordSet) Close() error

Close calls close on the unexported RecordSet data handle. It is the responsibility of the RecordSet user to call close. This is usually accomplished by a call to

defer rs.Close()

immediately after creating a NewRecordSet.

func (*RecordSet) Flush ¶

func (rs *RecordSet) Flush() error

Flush empties the buffer in the underlying csv.Writer held by the RecordSet and returns any error that has occurred in a previous write or flush.

func (*RecordSet) Headers ¶

func (rs *RecordSet) Headers() Headers

Headers returns the encapsulated headers data of the Recordset

func (*RecordSet) Read ¶

func (rs *RecordSet) Read() (*Record, error)

Read calls Read() on the csv.Reader held by the RecordSet and returns a Record. The idiomatic way to iterate over a recordset comes from the same idiom to read a file using encoding/csv.

func (*RecordSet) Save ¶

func (rs *RecordSet) Save(name string) error

Save writes the RecordSet to disk in the filename provided

func (*RecordSet) SetHeaders ¶

func (rs *RecordSet) SetHeaders(h Headers)

SetHeaders provides the expected interface to a RecordSet

func (*RecordSet) SortByTime ¶

func (rs *RecordSet) SortByTime() (*RecordSet, error)

SortByTime returns a pointer to a new RecordSet sorted in ascending order by BaseDateTime.

func (*RecordSet) Stash ¶

func (rs *RecordSet) Stash(rec *Record)

Stash allows a client to take Record that has been previously retrieved through Read() and ensure the next call to Read() returns this same Record.

func (*RecordSet) Subset ¶

func (rs *RecordSet) Subset(m Matching) (*RecordSet, error)

Subset returns a pointer to a new *RecordSet that contains all of the records that return true from calls to Match(*Record) (bool, error) on the provided argument m that implements the Matching interface. Returns nil for the *RecordSet when error is non-nil.

Example ¶

package main

import (
	"fmt"
	"time"

	"github.com/FATHOM5/ais"
)

type subsetOneDay struct {
	rs        *ais.RecordSet
	d1        time.Time
	timeIndex int
}

func (sod *subsetOneDay) Match(rec *ais.Record) (bool, error) {
	d2, err := time.Parse(ais.TimeLayout, (*rec)[sod.timeIndex])
	if err != nil {
		return false, fmt.Errorf("subsetOneDay: %v", err)
	}
	d2 = d2.Truncate(24 * time.Hour)
	return sod.d1.Equal(d2), nil
}

func main() {
	rs, _ := ais.OpenRecordSet("testdata/ten.csv")
	defer rs.Close()

	// Implement a concreate type of subsetOneDay to return records
	// from 25 Dec 2017.
	timeIndex, ok := rs.Headers().Contains("BaseDateTime")
	if !ok {
		panic("recordset does not contain the header BaseDateTime")
	}
	targetDate, _ := time.Parse("2006-01-02", "2017-12-25")
	sod := &subsetOneDay{
		rs:        rs,
		d1:        targetDate,
		timeIndex: timeIndex,
	}

	matches, _ := rs.Subset(sod)
	//matches.Save("newSet.csv")
	subsetRec, _ := matches.Read()
	subsetDate := (*subsetRec)[timeIndex]
	date, _ := time.Parse(ais.TimeLayout, subsetDate)
	fmt.Printf("The first record in the subset has BaseDateTime %v\n", date.Format("2006-01-02"))

}

Output:

The first record in the subset has BaseDateTime 2017-12-25

func (*RecordSet) SubsetLimit ¶

func (rs *RecordSet) SubsetLimit(m Matching, n int, multipass bool) (*RecordSet, error)

SubsetLimit returns a pointer to a new RecordSet with the first n records that return true from calls to Match(*Record) (bool, error) on the provided argument m that implements the Matching interface. Returns nil for the *RecordSet when error is non-nil. For n values less than zero, SubsetLimit will return all matches in the set.

SubsetLimit also implement a bool argument, multipass, that will reset the read pointer in the RecordSet to the beginning of the data when set to true. This has two important impacts. First, it allows the same rs receiver to be used multiple times in a row because the read pointer is reset each time after hitting EOF. Second, it has a significant performance penalty when dealing with a RecordSet of about one million or more records. When performance impacts from setting multipass to true outweigh the convenience of additional boilerplate code it is quite helpful. In situations where it is causing an issue use rs.Close() and then OpenRecordSet(filename) to get a fresh copy of the data.

func (*RecordSet) UniqueVessels ¶

func (rs *RecordSet) UniqueVessels() (VesselSet, error)

UniqueVessels returns a VesselMap, map[Vessel]int, that includes a unique key for each Vessel in the RecordSet. The value of each key is the number of Records for that Vessel in the data.

func (*RecordSet) UniqueVesselsMulti ¶

func (rs *RecordSet) UniqueVesselsMulti(multipass bool) (VesselSet, error)

UniqueVesselsMulti provides an option to control whether the RecordSet read pointer is returned to the top of the file. Using this option has a significant performance cost and is not recommended for any RecordSet with more than one million records. However, setting this version to true is valuable when the returned VesselMap is going to be used for additional queries on the same receiver. For example, ranging over the returned VesselSet to create a Subset of data for each ship requires reusing the rs reciver in most cases.

func (*RecordSet) Write ¶

func (rs *RecordSet) Write(rec Record) error

Write calls Write() on the csv.Writer held by the RecordSet and returns an error. The error is nil on a successful write. Flush() should be called at the end of necessary Write() calls to ensure the IO buffer flushed.

type Vessel ¶

type Vessel struct {
	MMSI       string
	VesselName string
}

Vessel is a struct for the identifying information about a specific ship in an AIS dataset. NOTE: REFINEMENT OF PACKAGE AIS WILL INCORPORATE MORE OF THE SHIP IDENTIFYING DATA COMMENTED OUT IN THIS MINIMALLY VIABLE IMPLEMENTATION.

type VesselSet ¶

type VesselSet map[Vessel]int

VesselSet is a set of unique vessels usually obtained by the return value of RecordSet.UniqueVessels(). For each Record of a Vessel in the set the int value of the VesselSet is incremented

type Window ¶

type Window struct {
	Data map[uint64]*Record
	// contains filtered or unexported fields
}

Window is used to create a convolution algorithm that slides down a RecordSet and performs analysis on Records that are within the a time window.

func NewWindow ¶

func NewWindow(rs *RecordSet, width time.Duration) (*Window, error)

NewWindow returns a *Window with the left marker set to the time in the next record read from the RecordSet. The Window width will be set from the argument provided and the righ marker will be derived from left and width. When creating a Window right after opening a RecordSet then the Window will be set to first Record in the set, but that first record will still be available to the client's first call to rs.Read(). For any non-nil error NewWindow returns nil and the error.

func (*Window) AddRecord ¶

func (win *Window) AddRecord(rec Record)

AddRecord appends a new Record to the data in the Window.

func (*Window) Config ¶

func (win *Window) Config() string

Config provides a pretty print of the Window's configuration

func (*Window) FindClusters ¶

func (win *Window) FindClusters(geohashIndex int) ClusterMap

FindClusters returns a ClusterMap that groups Records in the window into common Clusters that share the same geohash. It requires that the RecordSet Window it is operating on has a 'Geohash' field stored as a Uint64 with the proper prefix for the hash (i.e. 0x for hex representation).

func (*Window) InWindow ¶

func (win *Window) InWindow(t time.Time) bool

InWindow tests if a time is in the Window.

func (*Window) Left ¶

func (win *Window) Left() time.Time

Left returns the left marker.

func (*Window) Len ¶

func (win *Window) Len() int

Len returns the lenght of the slice holding the Records in the Window

func (*Window) RecordInWindow ¶

func (win *Window) RecordInWindow(rec *Record) (bool, error)

RecordInWindow returns true if the record is in the Window. Errors are possible from parsing the BaseDateTime field of the Record.

func (*Window) Right ¶

func (win *Window) Right() time.Time

Right returns the right marker.

func (*Window) SetIndex ¶

func (win *Window) SetIndex(index int)

SetIndex provides the integer index of the BaseDateTime field the Records stored in the Window.

func (*Window) SetLeft ¶

func (win *Window) SetLeft(marker time.Time)

SetLeft defines the left marker for the Window

func (*Window) SetRight ¶

func (win *Window) SetRight(marker time.Time)

SetRight defines the right marker of the Window.

func (*Window) SetWidth ¶

func (win *Window) SetWidth(dur time.Duration)

SetWidth provides the block of time coverd by the Window.

func (*Window) Slide ¶

func (win *Window) Slide(dur time.Duration)

Slide moves the window down by the time provided in the arugment dur. Slide also removes any data from the Window that would no longer return true from InWindow for the new left and right markers after the Slide.

func (*Window) String ¶

func (win *Window) String() string

String implements the Stringer interface for Window.

func (*Window) Width ¶

func (win *Window) Width() time.Duration

Width returns the width of the Window.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL