package
Version:
v0.0.0-...-804d1b8
Opens a new window with list of versions in this module.
Published: Nov 19, 2023
License: Apache-2.0
Opens a new window with license information.
Imports: 13
Opens a new window with list of imports.
Imported by: 0
Opens a new window with list of known importers.
Documentation
¶
tfidf provides IO functions to read text files and also remembers the source
of the text.
Read a PCollection of CorpusEntry and return a PCollection of DocEntry with the
contents of the file contained in the Text field.
type CorpusEntry struct {
RawFile string `beam:"rawFile"`
DocumentId string `beam:"glossFile"`
ColFile string `beam:"colFile"`
}
CorpusEntry contains metadata for a document that text will be read from
type DocEntry struct {
Text string `beam:"text"`
DocumentId string `beam:"glossFile"`
ColFile string `beam:"colFile"`
CorpusLen int `beam:"corpusLen"`
}
DocEntry contains all text and metadata for the document that it was extracted from
Source Files
¶
Click to show internal directories.
Click to hide internal directories.