extractor

package

v0.0.0-...-692e68a Latest Latest Go to latest Published: Aug 20, 2021 License: MIT Imports: 8 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ErrCode/PDFTextExtract

Links

Open Source Insights

Documentation ¶

Index ¶

type Extractor
- func New(contents string, f model.FontsByNames) *Extractor
- func (e *Extractor) ExtractText() (string, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Extractor ¶

type Extractor struct {
	// contains filtered or unexported fields
}

Extractor stores and offers functionality for extracting content from PDF pages.

func New ¶

func New(contents string, f model.FontsByNames) *Extractor

New returns an Extractor instance for extracting content from the input PDF page.

func (*Extractor) ExtractText ¶

func (e *Extractor) ExtractText() (string, error)

ExtractText processes and extracts all text data in content streams and returns as a string. Takes into account character encoding via CMaps in the PDF file. The text is processed linearly e.g. in the order in which it appears. A best effort is done to add spaces and newlines.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL