pagexml

package
v0.0.54 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 13, 2021 License: MIT Imports: 12 Imported by: 0

Documentation

Index

Constants

View Source
const MIMEType = "application/vnd.prima.page+xml"

MIMEType defines the mime type for page xml documents.

Variables

This section is empty.

Functions

func FindUnicodesInRegionSorted

func FindUnicodesInRegionSorted(region *xmlquery.Node) []*xmlquery.Node

FindUnicodesInRegionSorted searches for the TextEquiv / Unicode nodes beneath a text region (TextRegion, Line, Word, Glyph). The returend node list is ordered by the TextEquiv's index entries (interpreted as integers).

func SetMetadata added in v0.0.6

func SetMetadata(doc *xmlquery.Node, creator string, created, lastChange time.Time)

SetMetadata creates a new metadata node with the given content. If a previous metadata node exists, it is deleted.

func Tokenize

func Tokenize(metsName string, fgs ...string) apoco.StreamFunc

Tokenize returns a function that reads tokens from the page xml files of the given file groups. An empty token is inserted as sentry between the token of different file groups. The returned function ignores the input stream it just writes tokens to the output stream.

func TokenizeDirs

func TokenizeDirs(ext string, dirs ...string) apoco.StreamFunc

TokenizeDirs returns a function that reads page xml files with a matching file extension from the given directories. The returned function ignores the input stream. It only writes tokens to the output stream.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL