reader

package

v0.0.12 Latest Latest Go to latest Published: May 16, 2024 License: MIT Imports: 13 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/benoitkugler/pdf

Links

Open Source Insights

Documentation ¶

Overview ¶

Package reader leverage a PDF file reader to read a file, analyze its structure and build a high level, in-memory representation as a `model.Document`.

Index ¶

func DateTime(s string) (time.Time, bool)
func DecodeTextString(s string) string
func ParsePDFFile(filename string, options Options) (model.Document, *model.Encrypt, error)
func ParsePDFReader(source io.ReadSeeker, options Options) (model.Document, *model.Encrypt, error)
func ProcessContext(ctx file.PDFFile) (model.Document, *model.Encrypt, error)
type CustomObjectResolver
type Fl
type Options

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func DateTime ¶

func DateTime(s string) (time.Time, bool)

DateTime decodes s into a time.Time. It returns false if the string is a valid date. See - 7.9.4 Dates

func DecodeTextString ¶

func DecodeTextString(s string) string

DecodeTextString expects a "text string" as defined in PDF spec, that is, either a PDFDocEncoded string or a UTF-16BE string, and returns the UTF-8 corresponding string. Note that encryption, escaping or hex-encoding should already have been taken care of.

func ParsePDFFile ¶

func ParsePDFFile(filename string, options Options) (model.Document, *model.Encrypt, error)

ParsePDFFile opens a file and calls `ParsePDFReader`, see the latter for details.

func ParsePDFReader ¶

func ParsePDFReader(source io.ReadSeeker, options Options) (model.Document, *model.Encrypt, error)

ParsePDFReader reads a PDF file and builds a model. This is done in two steps:

a first parsing step (involving lexing and parsing) builds a tree object
this tree is then interpreted according to the PDF specification, resolving indirect objects and transforming dynamic, opaque types into statically typed objects, building the returned `Document`.

Information about encryption are returned separately, and will be needed if you want to encrypt the document back.

func ProcessContext ¶

func ProcessContext(ctx file.PDFFile) (model.Document, *model.Encrypt, error)

ProcessContext walks through an already parsed PDF to build a model. This function is exposed for debug purposes; you should probably use one of `ParsePDFFile` or `ParsePDFReader` methods.

Types ¶

type CustomObjectResolver ¶

type CustomObjectResolver interface {
	Resolve(f *file.PDFFile, obj model.Object) (model.Object, error)
}

CustomObjectResolver provides a way to overide the default reading behaviour for custom objects

type Fl ¶

type Fl = model.Fl

type Options ¶

type Options struct {
	CustomObjectResolver CustomObjectResolver
	UserPassword         string
}

Options enables greater control on the processing. The zero value is a valid default configuration.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
file Package file builds upon a parser to read an existing PDF file, producing a tree of PDF objets.	Package file builds upon a parser to read an existing PDF file, producing a tree of PDF objets.
parser Implements a PDF object parser, mapping a list of tokens (see the tokenizer package) into tree-like structure.	Implements a PDF object parser, mapping a list of tokens (see the tokenizer package) into tree-like structure.
filters Package filters provide logic to handle binary data encoded with PDF filters, such as inline data images.	Package filters provide logic to handle binary data encoded with PDF filters, such as inline data images.
filters/ccitt

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL