extractor

package
v0.0.0-...-692e68a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 20, 2021 License: MIT Imports: 8 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Extractor

type Extractor struct {
	// contains filtered or unexported fields
}

Extractor stores and offers functionality for extracting content from PDF pages.

func New

func New(contents string, f model.FontsByNames) *Extractor

New returns an Extractor instance for extracting content from the input PDF page.

func (*Extractor) ExtractText

func (e *Extractor) ExtractText() (string, error)

ExtractText processes and extracts all text data in content streams and returns as a string. Takes into account character encoding via CMaps in the PDF file. The text is processed linearly e.g. in the order in which it appears. A best effort is done to add spaces and newlines.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL