Documentation ¶
Index ¶
Constants ¶
const MIMEType = "application/vnd.prima.page+xml"
MIMEType defines the mime type for page xml documents.
Variables ¶
This section is empty.
Functions ¶
func FindUnicodesInRegionSorted ¶
FindUnicodesInRegionSorted searches for the TextEquiv / Unicode nodes beneath a text region (TextRegion, Line, Word, Glyph). The returend node list is ordered by the TextEquiv's index entries (interpreted as integers).
func SetMetadata ¶ added in v0.0.6
SetMetadata creates a new metadata node with the given content. If a previous metadata node exists, it is deleted.
func Tokenize ¶
func Tokenize(metsName string, fgs ...string) apoco.StreamFunc
Tokenize returns a function that reads tokens from the page xml files of the given file groups. An empty token is inserted as sentry between the token of different file groups. The returned function ignores the input stream it just writes tokens to the output stream.
func TokenizeDirs ¶
func TokenizeDirs(ext string, dirs ...string) apoco.StreamFunc
TokenizeDirs returns a function that reads page xml files with a matching file extension from the given directories. The returned function ignores the input stream. It only writes tokens to the output stream.
Types ¶
This section is empty.