parser

package

v0.0.4 Latest Latest Go to latest Published: Jul 28, 2023 License: MIT Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/benoitkugler/pdf

Links

Open Source Insights

Documentation ¶

Overview ¶

Implements a PDF object parser, mapping a list of tokens (see the tokenizer package) into tree-like structure. Higher-level reader is neeed to decrypt a full PDF file.

Index ¶

func ParseContent(content []byte, res model.ResourcesColorSpace) ([]cs.Operation, error)
func ParseContentResources(content []byte, res model.ResourcesColorSpace) (model.ResourcesDict, error)
func ParseDirectFilters(filters, decodeParams Object) (model.Filters, error)
func ParseFilters(filters, decodeParams Object, resolver func(Object) (Object, error)) (model.Filters, error)
type Array
type Bool
type Command
type Dict
type Fl
type Float
type HexLiteral
type IndirectRef
type Integer
type Name
type Object
- func ParseObject(data []byte) (Object, error)
- func ParseObjectDefinition(line []byte, headerOnly bool) (objectNumber int, generationNumber int, o Object, err error)
type Parser
- func NewParser(data []byte) *Parser
- func NewParserFromTokenizer(tokens *tkn.Tokenizer) *Parser
- func (pr *Parser) ParseContentElement(res model.ResourcesColorSpace) (cs.Operation, error)
- func (p *Parser) ParseObject() (Object, error)
type StringLiteral

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ParseContent ¶

func ParseContent(content []byte, res model.ResourcesColorSpace) ([]cs.Operation, error)

ParseContent parse a decrypted Content Stream. A resource dictionary is needed to handle inline image data, which can refer to a color space.

func ParseContentResources ¶

func ParseContentResources(content []byte, res model.ResourcesColorSpace) (model.ResourcesDict, error)

ParseContentResources return the resources needed by content. Note that only the names in the returned dicts are valid, all the values will be nil.

func ParseDirectFilters ¶

func ParseDirectFilters(filters, decodeParams Object) (model.Filters, error)

ParseDirectFiltersis the same as ParseFilters, but for direct objects. It is the case in image inline parameters and xRefStream dicts.

func ParseFilters ¶

func ParseFilters(filters, decodeParams Object, resolver func(Object) (Object, error)) (model.Filters, error)

ParseFilters process the given filters and their (optionnal) parameters. `resolver` is called to resolve the potential indirect objects An empty list may be returned if the filters are nil.

Types ¶

type Array ¶

type Array = model.ObjArray

type Bool ¶

type Bool = model.ObjBool

type Command ¶

type Command = model.ObjCommand

type Dict ¶

type Dict = model.ObjDict

type Fl ¶

type Fl = model.Fl

type Float ¶

type Float = model.ObjFloat

type HexLiteral ¶

type HexLiteral = model.ObjHexLiteral

type IndirectRef ¶

type IndirectRef = model.ObjIndirectRef

type Integer ¶

type Integer = model.ObjInt

type Name ¶

type Name = model.Name

type Object ¶

type Object = model.Object

func ParseObject ¶

func ParseObject(data []byte) (Object, error)

ParseObject tokenizes and parses the input, expecting a valid PDF object.

func ParseObjectDefinition ¶

func ParseObjectDefinition(line []byte, headerOnly bool) (objectNumber int, generationNumber int, o Object, err error)

ParseObjectDefinition parses an object definition. If `headerOnly`, stops after the X X obj header and return a nil object.

type Parser ¶

type Parser struct {

	// If true, disallow Indirect Reference,
	// but allow Commands
	ContentStreamMode bool
	// contains filtered or unexported fields
}

Standalone implementation of a PDF parser. The parser only handles chunks of PDF files (corresponding for example to object definitions), but cannot handle a full file with streams. An higher-level reader is needed to decode Streams and Inline Data, which require knowledge on the filters used.

func NewParser ¶

func NewParser(data []byte) *Parser

NewParser uses a byte slice as input.

func NewParserFromTokenizer ¶

func NewParserFromTokenizer(tokens *tkn.Tokenizer) *Parser

NewParserFromTokenizer use a tokenizer as input.

func (*Parser) ParseContentElement ¶

func (pr *Parser) ParseContentElement(res model.ResourcesColorSpace) (cs.Operation, error)

ParseContentElement parse one operation and avances. `ContentStreamMode` must have been set to true, and EOF should be checked before calling with method. See `ParseContent` for a convenient way of parsing a whole content stream.

func (*Parser) ParseObject ¶

func (p *Parser) ParseObject() (Object, error)

ParseObject read one of the (potentially) many objects in the input data (See NewParser).

type StringLiteral ¶

type StringLiteral = model.ObjStringLiteral

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
filters Package filters provide logic to handle binary data encoded with PDF filters, such as inline data images.	Package filters provide logic to handle binary data encoded with PDF filters, such as inline data images.
ccitt

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL