Documentation
¶
Overview ¶
Package docs implements a corpus of text documents identified by document IDs. It allows retrieving the documents by ID as well as retrieving documents that are new since a previous scan.
Index ¶
- func Latest[T Entry](src Source[T]) timed.DBTime
- func LatestFunc[T Entry](src Source[T]) func() timed.DBTime
- func Restart[T Entry](src Source[T])
- func Sync[T Entry, S Source[T]](dc *Corpus, src S)
- type Corpus
- func (c *Corpus) Add(id, title, text string)
- func (c *Corpus) Delete(id string)
- func (c *Corpus) DocWatcher(name string) *timed.Watcher[*Doc]
- func (c *Corpus) Docs(prefix string) iter.Seq[*Doc]
- func (c *Corpus) DocsAfter(dbtime timed.DBTime, prefix string) iter.Seq[*Doc]
- func (c *Corpus) Get(id string) (doc *Doc, ok bool)
- type Doc
- type Entry
- type Source
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func LatestFunc ¶
Latest returns a function that returns the latest known DBTime marked old by the source's DocWatcher.
func Restart ¶
Restart causes the next call to Sync to behave as if it has never sync'ed any data before for the src. The result is that all data will be reconverted to doc form and re-added. Docs that have not changed since the last addition to the corpus will appear unmodified; others will be marked new in the corpus.
Types ¶
type Corpus ¶
type Corpus struct {
// contains filtered or unexported fields
}
A Corpus is the collection of documents stored in a database.
func (*Corpus) Add ¶
Add adds a document with the given id, title, and text. If the document already exists in the corpus with the same title and text, Add is a no-op. Otherwise, if the document already exists in the corpus, it is replaced.
func (*Corpus) Delete ¶
Delete deletes a document with the given id. If the document does not exist inthe corpus, Delete is a no-op.
func (*Corpus) DocWatcher ¶
DocWatcher returns a new storage.Watcher with the given name. It picks up where any previous Watcher of the same name left off.
func (*Corpus) Docs ¶
Docs returns an iterator over all documents in the corpus with IDs starting with a given prefix. The documents are ordered by ID.
type Doc ¶
type Doc struct { DBTime timed.DBTime // DBTime when Doc was written ID string // document identifier (such as a URL) Title string // title of document Text string // text of document }
A Doc is a single document in the Corpus.
type Entry ¶
type Entry interface { // LastWritten returns the DBTime this piece of data was last written // to its data source. LastWritten() timed.DBTime }
Entry is a timed entry in a Source.
type Source ¶
type Source[T Entry] interface { // DocWatcher returns the watcher to use to keep track // of last [Sync] for this data source. DocWatcher() *timed.Watcher[T] // ToDocs converts the data to an iterator of [*Doc] values // that can be stored in a [Corpus]. // It returns (nil, false) if the data should not be stored // in the [Corpus]. ToDocs(T) (iter.Seq[*Doc], bool) }
Source is a data source to pull into a Corpus.