Documentation ¶
Index ¶
- Variables
- func Checkpoint(ctx context.Context, checkpoint []byte)
- func FakeCloser(r io.Reader) io.ReadCloser
- func MarshalGob(v interface{}) ([]byte, error)
- func RegisterDataSource(ds DataSource) error
- func UnmarshalGob(data []byte, v interface{}) error
- type Account
- type AuthenticateFn
- type CheckpointFn
- type Client
- type Collection
- type CollectionItem
- type DataSource
- type Item
- type ItemClass
- type ItemGraph
- type ItemRow
- type ListingOptions
- type Location
- type MergeOptions
- type Metadata
- type NewClientFn
- type OAuth2
- type Person
- type PersonIdentity
- type ProcessingOptions
- type RateLimit
- type RawRelation
- type Relation
- type Timeframe
- type Timeline
- type WrappedClient
- func (wc *WrappedClient) DataSourceID() string
- func (wc *WrappedClient) DataSourceName() string
- func (wc *WrappedClient) GetAll(ctx context.Context, procOpt ProcessingOptions) error
- func (wc *WrappedClient) GetLatest(ctx context.Context, procOpt ProcessingOptions) error
- func (wc *WrappedClient) Import(ctx context.Context, filename string, procOpt ProcessingOptions) error
- func (wc *WrappedClient) UserID() string
Constants ¶
This section is empty.
Variables ¶
var ( RelReplyTo = Relation{Label: "reply_to", Bidirectional: false} // "<from> is in reply to <to>" RelAttached = Relation{Label: "attached", Bidirectional: true} // "<to|from> is attached to <from|to>" RelQuotes = Relation{Label: "quotes", Bidirectional: false} // "<from> quotes <to>" RelCCed = Relation{Label: "carbon_copied", Bidirectional: false} // "<from_item> is carbon-copied to <to_person>" )
These are the standard relationships that Timeliner recognizes. Using these known relationships is not required, but it makes it easier to translate them to human-friendly phrases when visualizing the timeline.
var OAuth2AppSource func(providerID string, scopes []string) (oauth2client.App, error)
OAuth2AppSource returns an oauth2client.App for the OAuth2 provider with the given ID. Programs using data sources that authenticate with OAuth2 MUST set this variable, or the program will panic.
Functions ¶
func Checkpoint ¶
Checkpoint saves a checkpoint for the processing associated with the provided context. It overwrites any previous checkpoint. Any errors are logged.
func FakeCloser ¶
func FakeCloser(r io.Reader) io.ReadCloser
FakeCloser turns an io.Reader into an io.ReadCloser where the Close() method does nothing.
func MarshalGob ¶
MarshalGob is a convenient way to gob-encode v.
func RegisterDataSource ¶
func RegisterDataSource(ds DataSource) error
RegisterDataSource registers ds as a data source.
func UnmarshalGob ¶
UnmarshalGob is a convenient way to gob-decode data into v.
Types ¶
type Account ¶
type Account struct { ID int64 DataSourceID string UserID string // contains filtered or unexported fields }
Account represents an account with a service.
func (Account) NewHTTPClient ¶
NewHTTPClient returns an HTTP client that is suitable for use with an API associated with the account's data source. If OAuth2 is configured for the data source, the client has OAuth2 credentials. If a rate limit is configured, this client is rate limited. A sane default timeout is set, and any fields on the returned Client valule can be modified as needed.
func (Account) NewOAuth2HTTPClient ¶
NewOAuth2HTTPClient returns a new HTTP client which performs HTTP requests that are authenticated with an oauth2.Token stored with the account acc.
func (Account) NewRateLimitedRoundTripper ¶
func (acc Account) NewRateLimitedRoundTripper(rt http.RoundTripper) http.RoundTripper
NewRateLimitedRoundTripper adds rate limiting to rt based on the rate limiting policy registered by the data source associated with acc.
type AuthenticateFn ¶
AuthenticateFn is a function that authenticates userID with a service. It returns the authorization or credentials needed to operate. The return value should be byte-encoded so it can be stored in the DB to be reused. To store arbitrary types, encode the value as a gob, for example.
type CheckpointFn ¶
CheckpointFn is a function that saves a checkpoint.
type Client ¶
type Client interface { // ListItems lists the items on the account. Items should be // sent on itemChan as they are discovered, but related items // should be combined onto a single ItemGraph so that their // relationships can be stored. If the relationships are not // discovered until later, that's OK: item processing is // idempotent, so repeating an item from earlier will have no // adverse effects (this is possible because a unique ID is // required for each item). // // Implementations must honor the context's cancellation. If // ctx.Done() is closed, the function should return. Typically, // this is done by having an outer loop select over ctx.Done() // and default, where the next page or set of items is handled // in the default case. // // ListItems MUST close itemChan when returning. A // `defer close(itemChan)` will usually suffice. Closing // this channel signals to the processing goroutine that // no more items are coming. // // Further options for listing items may be passed in opt. // // If opt.Filename is specified, the implementation is expected // to open and list items from that file. If this is not // supported, an error should be returned. Conversely, if a // filename is not specified but required, an error should be // returned. // // opt.Timeframe consists of two optional timestamp and/or item // ID values. If set, item listings should be bounded in the // respective direction by that timestamp / item ID. (Items // are assumed to be part of a chronology; both timestamp and // item ID *may be* provided, when possible, to accommodate // data sources which do not constrain by timestamp but which // do by item ID instead.) The respective time and item ID // fields, if set, will not be in conflict, so either may be // used if both are present. While it should be documented if // timeframes are not supported, an error need not be returned // if they cannot be honored. // // opt.Checkpoint consists of the last checkpoint for this // account if the last call to ListItems did not finish and // if a checkpoint was saved. If not nil, the checkpoint // should be used to resume the listing instead of starting // over from the beginning. Checkpoint values usually consist // of page tokens or whatever state is required to resume. Call // timeliner.Checkpoint to set a checkpoint. Checkpoints are not // required, but if the implementation sets checkpoints, it // should be able to resume from one, too. ListItems(ctx context.Context, itemChan chan<- *ItemGraph, opt ListingOptions) error }
Client is a type that can interact with a data source.
type Collection ¶
type Collection struct { // The ID of the collection as given // by the service; for example, the // album ID. If the service does not // provide an ID for the collection, // invent one such that the next time // the collection is encountered and // processed, its ID will be the same. // An ID is necessary here to ensure // uniqueness. // // REQUIRED. OriginalID string // The name of the collection as // given by the service; for example, // the album title. // // Optional. Name *string // The description, caption, or any // other relevant text describing // the collection. // // Optional. Description *string // The items for the collection; // if ordering is significant, // specify each item's Position // field; the order of elememts // of this slice will not be // considered important. Items []CollectionItem }
Collection represents a group of items, like an album.
type CollectionItem ¶
type CollectionItem struct { // The item to add to the collection. Item Item // Specify if ordering is important. Position int // contains filtered or unexported fields }
CollectionItem represents an item stored in a collection.
type DataSource ¶
type DataSource struct { // A snake_cased name of the service // that uniquely identifies it from // all others. ID string // The human-readable or brand name of // the service. Name string // If the service authenticates with // OAuth2, fill out this field. OAuth2 OAuth2 // Otherwise, if the service uses some // other form of authentication, // Authenticate is a function which // returns the credentials needed to // access an account on the service. Authenticate AuthenticateFn // If the service enforces a rate limit, // specify it here. You can abide it by // getting an http.Client from the // Account passed into NewClient. RateLimit RateLimit // NewClient is a function which takes // information about the account and // returns a type which can facilitate // transactions with the service. NewClient NewClientFn }
DataSource has information about a data source that can be registered.
type Item ¶
type Item interface { // The unique ID of the item assigned by the service. // If the service does not assign one, then invent // one such that the ID is unique to the content or // substance of the item (for example, an ID derived // from timestamp or from the actual content of the // item -- whatever makes it unique). The ID need // only be unique for the account it is associated // with, although more unique is, of course, acceptable. // // REQUIRED. ID() string // The originating timestamp of the item, which // may be different from when the item was posted // or created. For example, a photo may be taken // one day but uploaded a week later. Prefer the // time when the original item content was captured. // // REQUIRED. Timestamp() time.Time // A classification of the item's kind. // // REQUIRED. Class() ItemClass // The user/account ID of the owner or // originator of the content, along with their // username or real name. The ID is used to // relate the item with the person behind it; // the name is used to make the person // recognizable to the human reader. If the // ID is nil, the current account owner will // be assumed. (Use the ID as given by the // data source.) If the data source only // provides a name but no ID, you may return // the name as the ID with the understanding // that a different name will be counted as a // different person. You may also return the // name as the name and leave the ID nil and // have correct results if it is safe to assume // the name belongs to the current account owner. Owner() (id *string, name *string) // Returns the text of the item, if any. // This field is indexed in the DB, so don't // use for unimportant metadata or huge // swaths of text; if there is a large // amount of text, use an item file instead. DataText() (*string, error) // For primary content which is not text or // which is too large to be stored well in a // database, the content can be downloaded // into a file. If so, the following methods // should return the necessary information, // if available from the service, so that a // data file can be obtained, stored, and // later read successfully. // // DataFileName returns the filename (NOT full // path or URL) of the file; prefer the original // filename if it originated as a file. If the // filename is not unique on disk when downloaded, // it will be made unique by modifying it. If // this value is nil/empty, a filename will be // generated from the item's other data. // // DataFileReader returns a way to read the data. // It will be closed when the read is completed. // // DataFileHash returns the checksum of the // content as provided by the service. If the // service (or data source) does not provide a // hash, leave this field empty, but note that // later it will be impossible to efficiently // know whether the content has changed on the // service from what is stored locally. // // DataFileMIMEType returns the MIME type of // the data file, if known. DataFileName() *string DataFileReader() (io.ReadCloser, error) DataFileHash() []byte DataFileMIMEType() *string // Metadata returns any optional metadata. // Feel free to leave as many fields empty // as you'd like: the less fields that are // filled out, the smaller the storage size. // Metadata is not indexed by the DB but is // rendered in projections and queries // according to the item's classification. Metadata() (*Metadata, error) // Location returns an item's location, // if known. For now, only Earth // coordinates are accepted, but we can // improve this later. Location() (*Location, error) }
Item is the central concept of a piece of content from a service or data source. Take note of which methods are required to return non-empty values.
The actual content of an item is stored either in the database or on disk as a file. Generally, content that is text-encoded can and should be stored in the database where it will be indexed. However, if the item's content (for example, the bytes of a photo or video) are not text or if the text is too large to store well in a database (for example, an entire novel), it should be stored on disk, and this interface has methods to accommodate both. Note that an item may have both text and non-text content, too: for example, photos and videos may have descriptions that are as much "content" as the media iteself. One part of an item is not mutually exclusive with any other.
type ItemGraph ¶
type ItemGraph struct { // The node item. This can be nil, but note that // Edges will not be traversed if Node is nil, // because there must be a node on both ends of // an edge. // // Optional. Node Item // Edges are represented as 1:many relations // to other "graphs" (nodes in the graph). // Fill this out to add multiple items to the // timeline at once, while drawing the // designated relationships between them. // Useful when processing related items in // batches. // // Directional relationships go from Node to // the map key. // // If the items involved in a relationship are // not efficiently available at the same time // (i.e. if loading both items involved in the // relationship would take a non-trivial amount // of time or API calls), you can use the // Relations field instead, but only after the // items have been added to the timeline. // // Optional. Edges map[*ItemGraph][]Relation // If items in the graph belong to a collection, // specify them here. If the collection does not // exist (by row ID or AccountID+OriginalID), it // will be created. If it already exists, the // collection in the DB will be unioned with the // collection specified here. Collections are // processed regardless of Node and Edges. // // Optional. Collections []Collection // Relationships between existing items in the // timeline can be represented here in a list // of item IDs that are connected by a label. // This field is useful when relationships and // the items involved in them are not discovered // at the same time. Relations in this list will // be added to the timeline, joined by the item // IDs described in the RawRelations, only if // the items having those IDs (as provided by // the data source; we're not talking about DB // row IDs here) already exist in the timeline. // In other words, this is a best-effort field; // useful for forming relationships of existing // items, but without access to the actual items // themselves. If you have the items involved in // the relationships, use Edges instead. // // Optional. Relations []RawRelation }
ItemGraph is an item with optional connections to other items. All ItemGraph values should be pointers to ensure consistency. The usual weird/fun thing about representing graph data structures in memory is that a graph is a node, and a node is a graph. 🤓
func (*ItemGraph) Add ¶
Add adds item to the graph ig by making an edge described by rel from the node ig to a new node for item.
This method is for simple inserts, where the only thing to add to the graph at this moment is a single item, since the graph it inserts contains only a single node populated by item. To add a full graph with multiple items (i.e. a graph with edges), call ig.Connect directly.
type ItemRow ¶
type ItemRow struct { ID int64 AccountID int64 OriginalID string PersonID int64 Timestamp time.Time Stored time.Time Modified *time.Time Class ItemClass MIMEType *string DataText *string DataFile *string DataHash *string // base64-encoded SHA-256 Metadata *Metadata Location // contains filtered or unexported fields }
ItemRow has the structure of an item's row in our DB.
type ListingOptions ¶
type ListingOptions struct { // A file from which to read the data. Filename string // Time bounds on which data to retrieve. // The respective time and item ID fields // which are set must never conflict. Timeframe Timeframe // A checkpoint from which to resume // item retrieval. Checkpoint []byte // Enable verbose output (logs). Verbose bool }
ListingOptions specifies parameters for listing items from a data source. Some data sources might not be able to honor all fields.
type MergeOptions ¶
type MergeOptions struct { // Enables "soft" merging. // // If true, an item may be merged if it is likely // to be the same as an existing item, even if the // item IDs are different. For example, if a // service has multiple ways of listing items, but // does not provide a consistent ID for the same // item across listings, a soft merge will allow the // processing to treat them as the same as long as // other fields match: timestamp, and either data text // or data filename. SoftMerge bool // Overwrite existing (old) item's ID with the ID // provided by the current (new) item. PreferNewID bool // Overwrite existing item's text data. PreferNewDataText bool // Overwrite existing item's data file. PreferNewDataFile bool // Overwrite existing item's metadata. PreferNewMetadata bool }
MergeOptions configures how items are merged. By default, items are not merged; if an item with a duplicate ID is encountered, it will be replaced with the new item (see the "reprocess" flag). Merging has to be explicitly enabled.
Currently, the only way to perform a merge is to enable "soft" merging: finding an item with the same timestamp and either text data or filename. Then, one of the item's IDs is updated to match the other. These merge options configure how the items are then combined.
As it is possible and likely for both items to have non-empty values for the same fields, these "conflicts" must be resolved non-interactively. By default, a merge conflict prefers existing values (old item's field) over the new one, and the new one only fills in missing values. (This seems safest.) However, these merge options allow you to customize that behavior and overwrite existing values with the new item's fields (only happens if new item's field is non-empty, i.e. a merge will never delete existing data).
type Metadata ¶
type Metadata struct { // A hash or etag provided by the service to // make it easy to know if it has changed ServiceHash []byte // Locations LocationAccuracy int Altitude int // meters AltitudeAccuracy int Heading int // degrees Velocity int GeneralArea string // natural language description of a location // Photos and videos EXIF map[string]interface{} Width int Height int // TODO: Google Photos (how many of these belong in EXIF?) CameraMake string CameraModel string FocalLength float64 ApertureFNumber float64 ISOEquivalent int ExposureTime time.Duration FPS float64 // Frames Per Second // Posts (Facebook so far) Link string Description string Name string ParentID string StatusType string Type string Likes int }
Metadata is a unified structure for storing item metadata in the DB.
type NewClientFn ¶
NewClientFn is a function that returns a client which, given the account passed in, can interact with a service provider.
type OAuth2 ¶
type OAuth2 struct { // The ID of the service must be recognized // by the OAuth2 app configuration. ProviderID string // The list of scopes to ask for during auth. Scopes []string }
OAuth2 defines which OAuth2 provider a service uses and which scopes it requires.
type Person ¶
type Person struct { ID int64 Name string Identities []PersonIdentity }
Person represents a person.
type PersonIdentity ¶
PersonIdentity is a way to map a user ID on a service to a person.
type ProcessingOptions ¶
type ProcessingOptions struct { Reprocess bool Prune bool Integrity bool Timeframe Timeframe Merge MergeOptions Verbose bool }
ProcessingOptions configures how item processing is carried out.
type RateLimit ¶
type RateLimit struct { RequestsPerHour int BurstSize int // contains filtered or unexported fields }
RateLimit describes a rate limit.
type RawRelation ¶
type RawRelation struct { FromItemID string ToItemID string FromPersonUserID string ToPersonUserID string Relation }
RawRelation represents a relationship between two items or people (or both) from the same data source (but not necessarily the same accounts; we assume that a data source's item IDs are globally unique across accounts). The item IDs should be those which are assigned/provided by the data source, NOT a database row ID. Likewise, the persons' user IDs should be the IDs of the user as associated with the data source, NOT their row IDs.
type Relation ¶
Relation describes how two nodes in a graph are related. It's essentially an edge on a graph.
type Timeframe ¶
Timeframe represents a start and end time and/or a start and end item, where either value could be nil which means unbounded in that direction. When items are used as the timeframe boundaries, the ItemID fields will be populated. It is not guaranteed that any particular field will be set or unset just because other fields are set or unset. However, if both Since or both Until fields are set, that means the timestamp and items are correlated; i.e. the Since timestamp is (approx.) that of the item ID. Or, put another way: there will never be conflicts among the fields which are non-nil.
type Timeline ¶
type Timeline struct {
// contains filtered or unexported fields
}
Timeline represents an opened timeline repository. The zero value is NOT valid; use Open() to obtain a valid value.
func Open ¶
Open creates/opens a timeline at the given repository directory. Timelines should always be Close()'d for a clean shutdown when done.
func (*Timeline) AddAccount ¶
AddAccount authenticates userID with the service identified within the application by dataSourceID, and then stores it in the database. The account must not yet exist.
func (*Timeline) Authenticate ¶
Authenticate gets authentication for userID with dataSourceID. If the account already exists in the database, it will be updated with the latest authorization.
func (*Timeline) NewClient ¶
func (t *Timeline) NewClient(dataSourceID, userID string) (WrappedClient, error)
NewClient returns a new Client that is ready to interact with the data source for the account uniquely specified by the data source ID and the user ID for that data source. The Client is actually wrapped by a type with unexported fields that are necessary for internal use.
type WrappedClient ¶
type WrappedClient struct { Client // contains filtered or unexported fields }
WrappedClient wraps a Client instance with unexported fields that contain necessary state for performing data collection operations. Do not craft this type manually; use Timeline.NewClient() to obtain one.
func (*WrappedClient) DataSourceID ¶
func (wc *WrappedClient) DataSourceID() string
DataSourceID returns the ID of the data source wc was created from.
func (*WrappedClient) DataSourceName ¶
func (wc *WrappedClient) DataSourceName() string
DataSourceName returns the name of the data source wc was created from.
func (*WrappedClient) GetAll ¶
func (wc *WrappedClient) GetAll(ctx context.Context, procOpt ProcessingOptions) error
GetAll gets all the items using wc. If procOpt.Reprocess is true, items that are already in the timeline will be re-processed. If procOpt.Prune is true, items that are not listed on the data source by wc will be removed from the timeline at the end of the listing. If procOpt.Integrity is true, all items that are listed by wc that exist in the timeline and which consist of a data file will be opened and checked for integrity; if the file has changed, it will be reprocessed.
func (*WrappedClient) GetLatest ¶
func (wc *WrappedClient) GetLatest(ctx context.Context, procOpt ProcessingOptions) error
GetLatest gets the most recent items from wc. It does not prune or reprocess; only meant for a quick pull (error will be returned if procOpt is not compatible). If there are no items pulled yet, all items will be pulled. If procOpt.Timeframe.Until is not nil, the latest only up to that timestamp will be pulled, and if until is after the latest item, no items will be pulled.
func (*WrappedClient) Import ¶
func (wc *WrappedClient) Import(ctx context.Context, filename string, procOpt ProcessingOptions) error
Import is like GetAll but for a locally-stored archive or export file that can simply be opened and processed, rather than needing to run over a network. See the godoc for GetAll. This is only for data sources that support Import.
func (*WrappedClient) UserID ¶
func (wc *WrappedClient) UserID() string
UserID returns the ID of the user associated with this client.
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
cmd
|
|
datasources
|
|
facebook
Package facebook implements the Facebook service using the Graph API: https://developers.facebook.com/docs/graph-api
|
Package facebook implements the Facebook service using the Graph API: https://developers.facebook.com/docs/graph-api |
googlelocation
Package googlelocation implements a Timeliner data source for importing data from the Google Location History (aka Google Maps Timeline).
|
Package googlelocation implements a Timeliner data source for importing data from the Google Location History (aka Google Maps Timeline). |
googlephotos
Package googlephotos implements the Google Photos service using its API, documented at https://developers.google.com/photos/.
|
Package googlephotos implements the Google Photos service using its API, documented at https://developers.google.com/photos/. |
instagram
Package instagram implements a Timeliner data source for importing data from Instagram archive files.
|
Package instagram implements a Timeliner data source for importing data from Instagram archive files. |
smsbackuprestore
Package smsbackuprestore implements a Timeliner data source for the Android SMS Backup & Restore app by SyncTech: https://synctech.com.au/sms-backup-restore/
|
Package smsbackuprestore implements a Timeliner data source for the Android SMS Backup & Restore app by SyncTech: https://synctech.com.au/sms-backup-restore/ |
twitter
Package twitter implements a Timeliner service for importing and downloading data from Twitter.
|
Package twitter implements a Timeliner service for importing and downloading data from Twitter. |