pdf

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 1, 2023 License: GPL-3.0 Imports: 24 Imported by: 1

Documentation

Overview

Package pdf provides support for reading and writing PDF files.

The package treats PDF files as containers containing a sequence of objects (typically Dictionaries and Streams). Object are written sequentially, but can be read in any order. These objects represent pages of text, fonts, images and so on. Subpackages implement support to produce PDF files representing pages of text and images.

A Reader can be used to read objects from an existing PDF file:

r, err := pdf.Open("in.pdf")
if err != nil {
    log.Fatal(err)
}
defer r.Close()
... use r.Catalog to locate objects in the file ...

A Writer can be used to write objects to a new PDF file:

w, err := pdf.Create("out.pdf")
if err != nil {
    log.Fatal(err)
}

... add objects to the document using w.Write() and w.OpenStream() ...

w.Catalog.Pages = ... // set the page tree

err = out.Close()
if err != nil {
    log.Fatal(err)
}

The following classes represent the native PDF object types. All of these implement the Object interface: Array, Bool, Dict, Integer, Name, Real, Reference, Stream, String.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Array

type Array []Object

Array represent an array of objects in a PDF file.

func (Array) AsRectangle

func (x Array) AsRectangle() (*Rectangle, error)

AsRectangle converts an array of 4 numbers to a Rectangle object. If the array does not have the correct format, an error is returned.

func (Array) PDF

func (x Array) PDF(w io.Writer) error

PDF implements the Object interface.

func (Array) String

func (x Array) String() string

type AuthenticationError

type AuthenticationError struct {
	ID []byte
}

AuthenticationError indicates that authentication failed because the correct password has not been supplied.

func (*AuthenticationError) Error

func (err *AuthenticationError) Error() string

type Bool

type Bool bool

Bool represents a boolean value in a PDF file.

func (Bool) PDF

func (x Bool) PDF(w io.Writer) error

PDF implements the Object interface.

type Catalog

type Catalog struct {
	Version           Version `pdf:"optional"`
	Extensions        Object  `pdf:"optional"`
	Pages             *Reference
	PageLabels        Object       `pdf:"optional"`
	Names             Object       `pdf:"optional"`
	Dests             Object       `pdf:"optional"`
	ViewerPreferences Object       `pdf:"optional"`
	PageLayout        Name         `pdf:"optional"`
	PageMode          Name         `pdf:"optional"`
	Outlines          *Reference   `pdf:"optional"`
	Threads           *Reference   `pdf:"optional"`
	OpenAction        Object       `pdf:"optional"`
	AA                Object       `pdf:"optional"`
	URI               Object       `pdf:"optional"`
	AcroForm          Object       `pdf:"optional"`
	MetaData          *Reference   `pdf:"optional"`
	StructTreeRoot    Object       `pdf:"optional"`
	MarkInfo          Object       `pdf:"optional"`
	Lang              language.Tag `pdf:"optional"`
	SpiderInfo        Object       `pdf:"optional"`
	OutputIntents     Object       `pdf:"optional"`
	PieceInfo         Object       `pdf:"optional"`
	OCProperties      Object       `pdf:"optional"`
	Perms             Object       `pdf:"optional"`
	Legal             Object       `pdf:"optional"`
	Requirements      Object       `pdf:"optional"`
	Collection        Object       `pdf:"optional"`
	NeedsRendering    bool         `pdf:"optional"`
	// contains filtered or unexported fields
}

Catalog represents a PDF Document Catalog. The only required field in this structure is Pages, which specifies the root of the page tree.

The Document Catalog is documented in section 7.7.2 of PDF 32000-1:2008.

type Dict

type Dict map[Name]Object

Dict represent a Dictionary object in a PDF file.

func AsDict

func AsDict(s interface{}) Dict

AsDict creates a PDF Dict object, encoding the fields of a Go struct.

func (Dict) Decode

func (d Dict) Decode(s interface{}, get func(Object) (Object, error)) error

Decode initialises a tagged struct using the data from a PDF dictionary. The argument s must be a pointer to a struct, or the function will panic. The function get, if non-nil, is used to resolve references to indirect objects, where needed; the Reader.Resolve method can be used for this argument.

func (Dict) PDF

func (x Dict) PDF(w io.Writer) error

PDF implements the Object interface.

func (Dict) String

func (x Dict) String() string

type FilterInfo

type FilterInfo struct {
	Name  Name
	Parms Dict
}

FilterInfo describes one PDF stream filter.

type Info

type Info struct {
	Title    string `pdf:"text string,optional"`
	Author   string `pdf:"text string,optional"`
	Subject  string `pdf:"text string,optional"`
	Keywords string `pdf:"text string,optional"`

	// Creator gives the name of the application that created the original
	// document, if the document was converted to PDF from another format.
	Creator string `pdf:"text string,optional"`

	// Producer gives the name of the application that converted the document,
	// if the document was converted to PDF from another format.
	Producer string `pdf:"text string,optional"`

	// CreationDate gives the date and time the document was created.
	CreationDate time.Time `pdf:"optional"`

	// ModDate gives the date and time the document was most recently modified.
	ModDate time.Time `pdf:"optional"`

	// Trapped indicates whether the document has been modified to include
	// trapping information.  (A trap is an overlap between adjacent areas of
	// of different colours, used to avoid visual problems caused by imprecise
	// alignment of different layers of ink.) Possible values are:
	//   * "True": The document has been fully trapped.  No further trapping is
	//     necessary.
	//   * "False": The document has not been trapped.
	//   * "Unknown" (default): Either it is unknown whether the document has
	//     been trapped, or the document has been partially trapped.  Further
	//     trapping may be necessary.
	Trapped Name `pdf:"optional,allowstring"`

	// Custom contains all non-standard fields in the Info dictionary.
	Custom map[string]string `pdf:"extra"`
}

Info represents a PDF Document Information Dictionary. All fields in this structure are optional.

The Document Information Dictionary is documented in section 14.3.3 of PDF 32000-1:2008.

type Integer

type Integer int64

Integer represents an integer constant in a PDF file.

func (Integer) PDF

func (x Integer) PDF(w io.Writer) error

PDF implements the Object interface.

type MalformedFileError

type MalformedFileError struct {
	Err error
	Pos int64
}

MalformedFileError indicates that the PDF file could not be parsed.

func (*MalformedFileError) Error

func (err *MalformedFileError) Error() string

func (*MalformedFileError) Unwrap

func (err *MalformedFileError) Unwrap() error

type Name

type Name string

Name represents a name object in a PDF file.

func ParseName

func ParseName(buf []byte) (Name, error)

ParseName parses a PDF name from the given buffer. The buffer must include the leading slash.

func (Name) PDF

func (x Name) PDF(w io.Writer) error

PDF implements the Object interface.

type Number

type Number float64

A Number is either an Integer or a Real.

func (Number) PDF

func (x Number) PDF(w io.Writer) error

PDF implements the Object interface.

type Object

type Object interface {
	// PDF writes the PDF file representation of the object to w.
	PDF(w io.Writer) error
}

Object represents an object in a PDF file. There are nine native types of PDF objects, which implement this interface: Array, Bool, Dict, Integer, Name, Real, Reference, Stream, and String. Custom types can be constructed of these basic types, by implementing the Object interface.

type PageRotation

type PageRotation int

PageRotation describes how a page shall be rotated when displayed or printed. The possible values are RotateInherit, Rotate0, Rotate90, Rotate180, Rotate270.

const (
	RotateInherit PageRotation = iota // use inherited value

	Rotate0   // don't rotate
	Rotate90  // rotate 90 degrees clockwise
	Rotate180 // rotate 180 degrees clockwise
	Rotate270 // rotate 270 degrees clockwise
)

Valid values for PageRotation.

We can't use the pdf integer values directly, because then we could not tell apart 0 degree rotations from unspecified rotations.

func DecodeRotation

func DecodeRotation(rot Integer) (PageRotation, error)

func (PageRotation) ToPDF

func (r PageRotation) ToPDF() Integer

type Perm

type Perm int

Perm describes which operations are permitted when accessing the document with User access (but not Owner access). This library just reports the permissions as specified in the PDF file. It is up to the caller to enforce the permissions.

const (
	// PermCopy allows to extract text and graphics.
	PermCopy Perm = 1 << iota

	// PermPrintDegraded allows printing of a low-level representation of the
	// appearance, possibly of degraded quality.
	PermPrintDegraded

	// PermPrint allows printing a representation from which a faithful digital
	// copy of the PDF content could be generated.  This implies
	// PermPrintDegraded.
	PermPrint

	// PermForms allows to fill in form fields, including signature fields.
	PermForms

	// PermAnnotate allows to add or modify text annotations. This implies
	// PermForms.
	PermAnnotate

	// PermAssemble allows to insert, rotate, or delete pages and to create
	// bookmarks or thumbnail images.
	PermAssemble

	// PermModify allows to modify the document.  This implies PermAssemble.
	PermModify

	// PermAll gives the user all permissions, making User access equivalent to
	// Owner access.
	PermAll = permNext - 1
)

type Placeholder

type Placeholder struct {
	// contains filtered or unexported fields
}

A Placeholder can be used to reserve space in a PDF file where some value can be filled in later. This is, for example, used to store the content length of a compressed stream in a PDF stream dictionary. Placeholer objects are created using Writer.NewPlaceholder.

func (*Placeholder) PDF

func (x *Placeholder) PDF(w io.Writer) error

PDF implements the Object interface.

func (*Placeholder) Set

func (x *Placeholder) Set(val Object) error

Set fills in the value of the placeholder object. This should be called as soon as possible after the value becomes known.

type ReadPwdFunc

type ReadPwdFunc func(ID []byte, try int) string

ReadPwdFunc describes a function which can be used to query the user for a password for the document with the given ID. The first call for each authentication attempt has try == 0. If the returned password was wrong, the function is called again, repeatedly, with sequentially increasing values of try. If the ReadPwdFunc return the empty string, the authentication attempt is aborted and an AuthenticationError is reported to the caller.

type Reader

type Reader struct {
	// Version is the PDF version used in this file.  This is specified in
	// the initial comment at the start of the file, and may be overridden by
	// the /Version entry in the document catalog.
	Version Version

	// The ID of the file.  This is either a slice of two byte slices (the
	// original ID of the file, and the ID of the current version), or nil if
	// the file does not specify an ID.
	ID [][]byte

	Catalog *Catalog
	// contains filtered or unexported fields
}

Reader represents a pdf file opened for reading. Use the functions Open or NewReader to create a new Reader.

func NewReader

func NewReader(data io.ReaderAt, size int64, readPwd ReadPwdFunc) (*Reader, error)

NewReader creates a new Reader object.

func Open

func Open(fname string) (*Reader, error)

Open opens the named PDF file for reading. After use, Reader.Close must be called to close the file the Reader is reading from.

func (*Reader) AuthenticateOwner

func (r *Reader) AuthenticateOwner() error

AuthenticateOwner tries to authenticate the owner of a document. If a password is required, this calls the readPwd function specified in the call to NewReader. The return value is nil if the owner was authenticated (or if no authentication is required), and an object of type AuthenticationError if the required password was not supplied.

func (*Reader) Close

func (r *Reader) Close() error

Close closes the file underlying the reader. This call only has an effect if the io.ReaderAt passed to NewReader has a Close method, or if the Reader was created using Open. Otherwise, Close has no effect and returns nil.

func (*Reader) GetArray

func (r *Reader) GetArray(obj Object) (Array, error)

GetArray resolves references to indirect objects and makes sure the resulting object is an array.

func (*Reader) GetDict

func (r *Reader) GetDict(obj Object) (Dict, error)

GetDict resolves references to indirect objects and makes sure the resulting object is a dictionary.

func (*Reader) GetInfo

func (r *Reader) GetInfo() (*Info, error)

GetInfo reads the PDF /Info dictionary for the file. If no Info dictionary is present, nil is returned.

func (*Reader) GetInt

func (r *Reader) GetInt(obj Object) (Integer, error)

GetInt resolves references to indirect objects and makes sure the resulting object is an Integer.

func (*Reader) GetRectangle

func (r *Reader) GetRectangle(obj Object) (*Rectangle, error)

GetRectangle resolves references to indirect objects and makes sure the resulting object is a PDF rectangle object. If the object is null, nil is returned.

func (*Reader) GetStream

func (r *Reader) GetStream(obj Object) (*Stream, error)

GetStream resolves references to indirect objects and makes sure the resulting object is a dictionary.

func (*Reader) ReadSequential

func (r *Reader) ReadSequential() (Object, *Reference, error)

ReadSequential returns the objects in a PDF file in the order they are stored in the file. When the end of file has been reached, io.EOF is returned.

The function returns the next object in the file, together with a Reference which can be used to read the object using [Reder.Resolce]. The read position is not affected by other methods of the Reader, sequential access can safely be interspersed with calls to Reader.Resolve.

ReadSequential makes some effort to repair problems in corrupted or malformed PDF files. In particular, it may still work when the Reader.Resolve method fails with errors.

func (*Reader) Resolve

func (r *Reader) Resolve(obj Object) (Object, error)

Resolve resolves references to indirect objects.

If obj is of type *Reference, the function loads the corresponding object from the file and returns the result. Otherwise, obj is returned unchanged.

type Real

type Real float64

Real represents an real number in a PDF file.

func (Real) PDF

func (x Real) PDF(w io.Writer) error

PDF implements the Object interface.

type Rectangle

type Rectangle struct {
	LLx, LLy, URx, URy float64
}

Rectangle represents a PDF rectangle.

func (*Rectangle) Extend

func (rect *Rectangle) Extend(other *Rectangle)

Extend enlarges the rectangle to also cover `other`.

func (Rectangle) IsZero

func (rect Rectangle) IsZero() bool

IsZero is true if the rectangle is the zero rectangle object.

func (*Rectangle) NearlyEqual

func (rect *Rectangle) NearlyEqual(other *Rectangle, eps float64) bool

NearlyEqual reports whether the corner coordinates of two rectangles differ by less than `eps`.

func (*Rectangle) PDF

func (rect *Rectangle) PDF(w io.Writer) error

PDF implements the Object interface.

func (*Rectangle) String

func (rect *Rectangle) String() string

func (*Rectangle) XPos

func (rect *Rectangle) XPos(rel float64) float64

func (*Rectangle) YPos

func (rect *Rectangle) YPos(rel float64) float64

type Reference

type Reference struct {
	Number     int
	Generation uint16
}

Reference represents a reference to an indirect object in a PDF file. TODO(voss): use the struct directly, rather than pointers to the struct? TODO(voss): use a fixed-size type for Number?

func (*Reference) PDF

func (x *Reference) PDF(w io.Writer) error

PDF implements the Object interface.

func (*Reference) String

func (x *Reference) String() string

type Resource

type Resource interface {
	// Write writes the resource to the PDF file.  No changes can be
	// made to the resource after it has been written.
	Write(w *Writer) error

	Reference() *Reference
}

type Resources

type Resources struct {
	ExtGState  Dict  `pdf:"optional"` // maps resource names to graphics state parameter dictionaries
	ColorSpace Dict  `pdf:"optional"` // maps each resource name to either the name of a device-dependent colour space or an array describing a colour space
	Pattern    Dict  `pdf:"optional"` // maps resource names to pattern objects
	Shading    Dict  `pdf:"optional"` // maps resource names to shading dictionaries
	XObject    Dict  `pdf:"optional"` // maps resource names to external objects
	Font       Dict  `pdf:"optional"` // maps resource names to font dictionaries
	ProcSet    Array `pdf:"optional"` // predefined procedure set names
	Properties Dict  `pdf:"optional"` // maps resource names to property list dictionaries for marked content
}

Resources describes a PDF Resource Dictionary. See section 7.8.3 of PDF 32000-1:2008 for details. TODO(voss): use []*font.Font for the .Font field?

type Stream

type Stream struct {
	Dict
	R io.Reader
	// contains filtered or unexported fields
}

Stream represent a stream object in a PDF file.

func (*Stream) Decode

func (x *Stream) Decode(resolve func(Object) (Object, error)) (io.Reader, error)

Decode returns a reader for the decoded stream data.

TODO(voss): allow to decode only the first few filters?

func (*Stream) Filters

func (x *Stream) Filters(resolve func(Object) (Object, error)) ([]*FilterInfo, error)

Filters extracts the information contained in the /Filter and /DecodeParms entries of the stream dictionary.

func (*Stream) PDF

func (x *Stream) PDF(w io.Writer) error

PDF implements the Object interface.

func (*Stream) String

func (x *Stream) String() string

type String

type String []byte

String represents a raw string in a PDF file. The character set encoding, if any, is determined by the context.

func Date

func Date(t time.Time) String

Date creates a PDF String object encoding the given date and time.

func ParseString

func ParseString(buf []byte) (String, error)

ParseString parses a string from the given buffer. The buffer must include the surrounding parentheses or angle brackets.

func TextString

func TextString(s string) String

TextString creates a String object using the "text string" encoding, i.e. using either UTF-16BE encoding (with a BOM) or PdfDocEncoding.

func (String) AsDate

func (x String) AsDate() (time.Time, error)

AsDate converts a PDF date string to a time.Time object. If the string does not have the correct format, an error is returned.

func (String) AsTextString

func (x String) AsTextString() string

AsTextString interprets x as a PDF "text string" and returns the corresponding utf-8 encoded string.

func (String) PDF

func (x String) PDF(w io.Writer) error

PDF implements the Object interface.

type Version

type Version int

Version represent the version of PDF standard used in a file.

const (
	V1_0 Version
	V1_1
	V1_2
	V1_3
	V1_4
	V1_5
	V1_6
	V1_7
)

PDF versions supported by this library.

func ParseVersion

func ParseVersion(verString string) (Version, error)

ParseVersion parses a PDF version string.

func (Version) String

func (ver Version) String() string

func (Version) ToString

func (ver Version) ToString() (string, error)

ToString returns the string representation of ver, e.g. "1.7". If ver does not correspond to a supported PDF version, an error is returned.

type VersionError

type VersionError struct {
	Operation string
	Earliest  Version
}

VersionError is returned when trying to use a feature in a PDF file which is not supported by the PDF version used. Use Writer.CheckVersion to create VersionError objects.

func (*VersionError) Error

func (err *VersionError) Error() string

type Writer

type Writer struct {
	// Version is the PDF version used in this file.  This field is
	// read-only.  Use the opt argument of NewWriter to set the PDF version for
	// a new file.
	Version Version

	// The Document Catalog is documented in section 7.7.2 of PDF 32000-1:2008.
	Catalog *Catalog

	Resources map[interface{}]Resource
	// contains filtered or unexported fields
}

Writer represents a PDF file open for writing. Use the functions Create or NewWriter to create a new Writer.

func Create

func Create(name string) (*Writer, error)

Create creates the named PDF file and opens it for output. If a previous file with the same name exists, it is overwritten. After writing is complete, Writer.Close must be called to write the trailer and to close the underlying file.

If non-default settings are required, NewWriter can be used to set options.

func NewWriter

func NewWriter(w io.Writer, opt *WriterOptions) (*Writer, error)

NewWriter prepares a PDF file for writing.

The Writer.Close method must be called after the file contents have been written, to add the trailer and the cross reference table to the PDF file. It is the callers responsibility, to close the writer w after the pdf.Writer has been closed.

func (*Writer) Alloc

func (pdf *Writer) Alloc() *Reference

Alloc allocates an object number for an indirect object.

func (*Writer) CheckVersion

func (pdf *Writer) CheckVersion(operation string, minVersion Version) error

CheckVersion checks whether the PDF file being written has version minVersion or later. If the version is new enough, nil is returned. Otherwise a VersionError for the given operation is returned.

func (*Writer) Close

func (pdf *Writer) Close() error

Close closes the Writer, flushing any unwritten data to the underlying io.Writer.

func (*Writer) NewPlaceholder

func (pdf *Writer) NewPlaceholder(size int) *Placeholder

NewPlaceholder creates a new placeholder for a value which is not yet known. The argument size must be an upper bound to the length of the replacement text. Once the value becomes known, it can be filled in using the Placeholder.Set method.

func (*Writer) OnClose

func (pdf *Writer) OnClose(callback func(*Writer) error)

OnClose registers a callback function which is called before the writer is closed. Callbacks are executed in the reverse order, i.e. the last callback registered is the first one to run.

TODO(voss): remove?

func (*Writer) OpenStream

func (pdf *Writer) OpenStream(dict Dict, ref *Reference, filters ...*FilterInfo) (io.WriteCloser, *Reference, error)

OpenStream adds a PDF Stream to the file and returns an io.Writer which can be used to add the stream's data. No other objects can be added to the file until the stream is closed.

func (*Writer) SetInfo

func (pdf *Writer) SetInfo(info *Info)

SetInfo sets the Document Information Dictionary for the file.

func (*Writer) Write

func (pdf *Writer) Write(obj Object, ref *Reference) (*Reference, error)

Write writes an object to the PDF file, as an indirect object. The returned reference can be used to refer to this object from other parts of the file.

func (*Writer) WriteCompressed

func (pdf *Writer) WriteCompressed(refs []*Reference, objects ...Object) ([]*Reference, error)

WriteCompressed writes a number of objects to the file as a compressed object stream. Object streams are only available for PDF version 1.5 and newer; in case the file version is too low, the objects are written directly into the PDF file, without compression.

type WriterOptions

type WriterOptions struct {
	Version Version
	ID      [][]byte

	UserPassword   string
	OwnerPassword  string
	UserPermission Perm
}

WriterOptions allows to influence the way a PDF file is generated.

Directories

Path Synopsis
Package color implements different PDF color spaces.
Package color implements different PDF color spaces.
demo
cff-glyphs
Read a CFF font and display a magnified version of each glyph in a PDF file.
Read a CFF font and display a magnified version of each glyph in a PDF file.
Package font implements the PDF font handling.
Package font implements the PDF font handling.
builtin
Package builtin implements support for the 14 built-in PDF fonts.
Package builtin implements support for the 14 built-in PDF fonts.
cid
Package cid provides support for embedding CID fonts into PDF documents.
Package cid provides support for embedding CID fonts into PDF documents.
simple
Package simple provides support for embedding simple fonts into PDF documents.
Package simple provides support for embedding simple fonts into PDF documents.
type3
Package type3 provides support for embedding type 3 fonts into PDF documents.
Package type3 provides support for embedding type 3 fonts into PDF documents.
Package graphics allows to draw on a PDF page.
Package graphics allows to draw on a PDF page.
Package image provides functions for embedding images in PDF files.
Package image provides functions for embedding images in PDF files.
internal
Package lzw implements the Lempel-Ziv-Welch compressed data format.
Package lzw implements the Lempel-Ziv-Welch compressed data format.
Package pages implements PDF page trees.
Package pages implements PDF page trees.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL