Documentation
¶
Overview ¶
Package datastore provides the datastore to keep track of all the information needed.
Index ¶
- func DocLinkKey(link DocLink) string
- func DriveKey(id string) string
- func EntityMentionKey(m EntityMention) string
- func GormIgnored(typ interface{}) cmp.Option
- type Datastore
- func (d *Datastore) Close() error
- func (d *Datastore) FindEntity(q EntityQuery) ([]*Entity, error)
- func (d *Datastore) ListDocLinks(destId string) ([]*DocLink, error)
- func (d *Datastore) ListDocReferences() ([]*DocReference, error)
- func (d *Datastore) ListEntities() ([]*Entity, error)
- func (d *Datastore) ListEntityMentions(docId string) ([]*EntityMention, error)
- func (d *Datastore) ToBeIndexed() ([]*DocReference, error)
- func (d *Datastore) UpdateDocLink(l *DocLink) error
- func (d *Datastore) UpdateDocReference(r *DocReference) error
- func (d *Datastore) UpdateEntity(m *Entity) error
- func (d *Datastore) UpdateEntityMention(m *EntityMention) error
- type DocLink
- type DocReference
- type DocReferenceIter
- type Entity
- type EntityMention
- type EntityQuery
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DocLinkKey ¶
DocLinkKey generates the primary key for the given DocLink. WARNING: Changing this code will break existing databases since existing data will not have compatible keys.
func DriveKey ¶
DriveKey generates the primary key for the given Google Drive file WARNING: Changing this code will break existing databases since existing data will not have compatible keys.
func EntityMentionKey ¶
func EntityMentionKey(m EntityMention) string
EntityMentionKey generates the primary key for the given EntityMention. WARNING: Changing this code will break existing databases since existing data will not have compatible keys.
func GormIgnored ¶
GormIgnored returns IgnoreFields for the given type e.g GormIgnored(DocReference{})
Types ¶
type Datastore ¶
type Datastore struct {
// contains filtered or unexported fields
}
func (*Datastore) FindEntity ¶
func (d *Datastore) FindEntity(q EntityQuery) ([]*Entity, error)
FindEntities is a primitive form of entity linking.
TODO(jeremy): How should we handle the case where we could potentially have multiple entries in the database that would match?
func (*Datastore) ListDocLinks ¶
ListDocLinks lists all the doc links. destId optional if supplied list all the links pointing at this destination id
func (*Datastore) ListDocReferences ¶
func (d *Datastore) ListDocReferences() ([]*DocReference, error)
ListDocReferences lists all the docreferences.
func (*Datastore) ListEntities ¶
ListEntities lists all the entities.
func (*Datastore) ListEntityMentions ¶
func (d *Datastore) ListEntityMentions(docId string) ([]*EntityMention, error)
ListEntityMentions lists all the entity mentions. docId is optional if supplied list all the mentions for the provided doc.
func (*Datastore) ToBeIndexed ¶
func (d *Datastore) ToBeIndexed() ([]*DocReference, error)
ToBeIndexed returns a list of DocReferences that need to be indexed.
func (*Datastore) UpdateDocLink ¶
UpdateDocLink updates or creates the DocLink
TODO(jeremy): These function needs to be updated to allow for a given link to appear multiple times in a doc. In that case we want to have multiple entries in the doc.
func (*Datastore) UpdateDocReference ¶
func (d *Datastore) UpdateDocReference(r *DocReference) error
UpdateDocReference updates or creates the DocReference
func (*Datastore) UpdateEntity ¶
UpdateEntity updates or creates the Entity
TODO(jeremy): The semantics for dealing with multiple entities with the same name are ill defined. Right now it is the caller's job to do entity linking before calling UpdateEntity. If an entity with a given name already exists in the database but m represents a different entity with the same name then caller should assign a unique id to it.
func (*Datastore) UpdateEntityMention ¶
func (d *Datastore) UpdateEntityMention(m *EntityMention) error
UpdateEntityMention updates or creates the EntityMention
type DocLink ¶
type DocLink struct { // The unique id follows the convention sourceId-destId-startIndex-endindex. // This is arguably not space efficient but we can optimize later. ID string `gorm:"primarykey"` CreatedAt time.Time UpdatedAt time.Time DeletedAt gorm.DeletedAt `gorm:"index"` // SourceID is the id of the destination doc SourceID string `gorm:"index"` // DestID is the destination doc DestID string `gorm:"index"` // URI is the URI the link is pointing to URI string // Text is the text associated with the link. Text string // StartIndex of the text for the link. StartIndex int64 // EndIndex of the text for the link. EndIndex int64 }
DocLink is a directional link between two docs. We do not rely on GormAssociations for a couple reasons
- We want to stick with a CRUD API to allow for more flexible backends
- Using associations adds complexity in terms of how it gets used I believe in order to populate associations its doing joins There's also confusion on which fields user should set to update when using a BelongsTo association as there are separate fields for the foreign key and the reference.
- GraphQL might be a better API for joins.
A link can appear more than once between two documents; i.e. a given doc can have multiple hyperlinks to another document. Not all links will have destId set.
type DocReference ¶
type DocReference struct { // The unique id follows the convention $namespace.id where namespace identifies a namespace with respect // to which id is unique. Typically namespace is the source of the file e.g. Google Drive. ID string `gorm:"primarykey"` CreatedAt time.Time UpdatedAt time.Time DeletedAt gorm.DeletedAt `gorm:"index"` // The ID of the file in Google Drive. We create a unique index named uid to ensure there is // one row for each doc. This could eventually become a composite index because we want to allow for // documents in different systems (e.g. Drive and GitHub). In which case the uid index would be a composite // key on DriveId and GitHub and only one will be set. DriveId string `gorm:"index:uid,unique"` Name string MimeType string // TODO(jeremy): We should rename the checksum fields. To be opaque version numbers. They won't always be // checksums. // Md5Checksum is current checksum Md5Checksum string // LastIndexedMd5Checksum is the checksum at which it was last indexed LastIndexedMd5Checksum string }
DocReference is a reference to a document stored in some system such as Google Drive.
type DocReferenceIter ¶
type DocReferenceIter func(r *DocReference) error
DocReferenceIter is an iterator over DocReferences
type Entity ¶
type Entity struct { ID string `gorm:"primarykey"` CreatedAt time.Time UpdatedAt time.Time DeletedAt gorm.DeletedAt `gorm:"index"` // Name is the canonical name of the entity Name string // Type of entity Type string // WikipediaURL associated with this entity if there is one. WikipediaUrl string // MID is the Google Knowledge Graph MID if there is one MID string `gorm:"column:mid"` }
Entity is a unique entity.
type EntityMention ¶
type EntityMention struct { // The unique id follows the convention docId-startIndex-endindex. // Assumption is a given range can only be a single entity. ID string `gorm:"primarykey"` CreatedAt time.Time UpdatedAt time.Time DeletedAt gorm.DeletedAt `gorm:"index"` // DocID is the id of the doc DocID string `gorm:"index"` EntityID string // Text associated with the entity Text string // StartIndex of the text for the link. StartIndex int64 // EndIndex of the text for the link. EndIndex int64 }
EntityMention is the mention of some entity in a doc.
We do not rely on GormAssociations for a couple reasons
- We want to stick with a CRUD API to allow for more flexible backends
- Using associations adds complexity in terms of how it gets used I believe in order to populate associations its doing joins There's also confusion on which fields user should set to update when using a BelongsTo association as there are separate fields for the foreign key and the reference.
- GraphQL might be a better API for joins.
A specific entity can appear more than once in a given doc.
TODO(jeremy): We also need an Entity table and should attempt to do some entity linking.