reader

package
v0.0.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 28, 2023 License: MIT Imports: 13 Imported by: 1

Documentation

Overview

Package reader leverage a PDF file reader to read a file, analyze its structure and build a high level, in-memory representation as a `model.Document`.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DateTime

func DateTime(s string) (time.Time, bool)

DateTime decodes s into a time.Time. It returns false if the string is a valid date. See - 7.9.4 Dates

func DecodeTextString

func DecodeTextString(s string) string

DecodeTextString expects a "text string" as defined in PDF spec, that is, either a PDFDocEncoded string or a UTF-16BE string, and returns the UTF-8 corresponding string. Note that encryption, escaping or hex-encoding should already have been taken care of.

func ParsePDFFile

func ParsePDFFile(filename string, options Options) (model.Document, *model.Encrypt, error)

ParsePDFFile opens a file and calls `ParsePDFReader`, see the latter for details.

func ParsePDFReader

func ParsePDFReader(source io.ReadSeeker, options Options) (model.Document, *model.Encrypt, error)

ParsePDFReader reads a PDF file and builds a model. This is done in two steps:

  • a first parsing step (involving lexing and parsing) builds a tree object
  • this tree is then interpreted according to the PDF specification, resolving indirect objects and transforming dynamic, opaque types into statically typed objects, building the returned `Document`.

Information about encryption are returned separately, and will be needed if you want to encrypt the document back.

func ProcessContext

func ProcessContext(ctx file.PDFFile) (model.Document, *model.Encrypt, error)

ProcessContext walks through an already parsed PDF to build a model. This function is exposed for debug purposes; you should probably use one of `ParsePDFFile` or `ParsePDFReader` methods.

Types

type CustomObjectResolver

type CustomObjectResolver interface {
	Resolve(f *file.PDFFile, obj model.Object) (model.Object, error)
}

CustomObjectResolver provides a way to overide the default reading behaviour for custom objects

type Fl

type Fl = model.Fl

type Options

type Options struct {
	CustomObjectResolver CustomObjectResolver
	UserPassword         string
}

Options enables greater control on the processing. The zero value is a valid default configuration.

Directories

Path Synopsis
Package file builds upon a parser to read an existing PDF file, producing a tree of PDF objets.
Package file builds upon a parser to read an existing PDF file, producing a tree of PDF objets.
Implements a PDF object parser, mapping a list of tokens (see the tokenizer package) into tree-like structure.
Implements a PDF object parser, mapping a list of tokens (see the tokenizer package) into tree-like structure.
filters
Package filters provide logic to handle binary data encoded with PDF filters, such as inline data images.
Package filters provide logic to handle binary data encoded with PDF filters, such as inline data images.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL