page

package
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 29, 2018 License: MIT Imports: 8 Imported by: 1

Documentation

Index

Constants

View Source
const (
	// MIMEType defines the mime-type of page XML files.
	// See: https://github.com/PRImA-Research-Lab/PAGE-XML
	MIMEType = "application/alto+xml"
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Line added in v0.4.0

type Line struct {
	// contains filtered or unexported fields
}

Line represents a line of text in the page XML file.

func (Line) FindWordByID added in v0.4.0

func (l Line) FindWordByID(id string) (Word, bool)

FindWordByID searches for a line with the given ID.

func (Line) ID added in v0.4.0

func (w Line) ID() string

ID returns the line's ID.

func (Line) Polygon added in v0.5.0

func (l Line) Polygon() (Polygon, error)

Polygon returns the line's polygon of coordinates.

func (Line) TextEquivUnicodeAt added in v0.4.0

func (l Line) TextEquivUnicodeAt(pos int) (string, bool)

TextEquivUnicodeAt returns the i-th TextEquiv/Unicode entry (indexing is zero-based).

func (Line) Words added in v0.4.0

func (l Line) Words() []Word

Words returns all words in a line.

type Match added in v0.5.0

type Match struct {
	RegionID, LineID, WordID string
}

Match is used to match text regions. If any of the IDs is the empty string, the according region is ignored.

func (Match) String added in v0.5.0

func (m Match) String() string

type Page

type Page struct {
	// contains filtered or unexported fields
}

Page represents an open page XML file.

func Open

func Open(path string) (Page, error)

Open opens a page XML file

func (Page) Find added in v0.5.0

func (p Page) Find(m Match) (TextRegion, bool)

Find searches for a given {region,line,word}-ID in the PAGE-XML (IDs are assumed to be unique).

func (Page) FindRegionByID added in v0.4.0

func (p Page) FindRegionByID(id string) (Region, bool)

FindRegionByID returns the region with the given ID.

func (Page) Regions

func (p Page) Regions() []Region

Regions returns a slice with all RegionRefIndexed elements

type Polygon added in v0.5.0

type Polygon []image.Point

Polygon is used to represent the polygons of <Coords points='...'/> points in the PAGE-XML.

func (Polygon) Rectangle added in v0.5.0

func (p Polygon) Rectangle() image.Rectangle

Rectangle returns the bounding rectangle of the polygon.

type Region

type Region struct {
	// contains filtered or unexported fields
}

Region defines a text region in the page XML file.

func (Region) FindLineByID

func (r Region) FindLineByID(id string) (Line, bool)

FindLineByID searches for a line with the given ID.

func (Region) ID added in v0.4.0

func (w Region) ID() string

ID returns the region's ID.

func (Region) Lines

func (r Region) Lines() []Line

Lines Returns all lines in a region.

func (Region) Polygon added in v0.5.0

func (r Region) Polygon() (Polygon, error)

Polygon returns the region's polygon of coordinates.

func (Region) TextEquivUnicodeAt

func (r Region) TextEquivUnicodeAt(pos int) (string, bool)

TextEquivUnicodeAt returns the i-th TextEquiv/Unicode entry (indexing is zero-based).

type TextRegion added in v0.5.0

type TextRegion interface {
	ID() string
	TextEquivUnicodeAt(int) (string, bool)
	Polygon() (Polygon, error)
}

TextRegion defines an interface for abstract text regions in a PAGE-XML document.

type Word

type Word struct {
	// contains filtered or unexported fields
}

Word represents a word on a line.

func (Word) ID

func (w Word) ID() string

ID returns the word's ID.

func (Word) Polygon added in v0.5.0

func (w Word) Polygon() (Polygon, error)

Polygon returns the word's polygon of coordinates.

func (Word) TextEquivUnicodeAt

func (w Word) TextEquivUnicodeAt(pos int) (string, bool)

TextEquivUnicodeAt returns the i-th TextEquiv/Unicode element (the indexing is zero-based).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL