Documentation ¶
Overview ¶
Package pdf provides support for reading and writing PDF files.
The package treats PDF files as containers containing a sequence of objects (typically Dictionaries and Streams). Object are written sequentially, but can be read in any order. These objects represent pages of text, fonts, images and so on. Subpackages implement support to produce PDF files representing pages of text and images.
A Reader can be used to read objects from an existing PDF file:
r, err := pdf.Open("in.pdf") if err != nil { log.Fatal(err) } defer r.Close() ... use r.Catalog to locate objects in the file ...
A Writer can be used to write objects to a new PDF file:
w, err := pdf.Create("out.pdf") if err != nil { log.Fatal(err) } ... add objects to the document using w.Write() and w.OpenStream() ... w.Catalog.Pages = ... // set the page tree err = out.Close() if err != nil { log.Fatal(err) }
The following classes represent the native PDF object types: Array, Bool, Dict, Integer, Name, Real, Reference, *Stream, String. All of these implement the Object interface.
Index ¶
- Constants
- Variables
- type Array
- type AuthenticationError
- type Bool
- type Catalog
- type Data
- type Dict
- type FileInfo
- type FileObject
- type FileSection
- type FilterInfo
- type Getter
- type Info
- type Integer
- type MalformedFileError
- type Name
- type Number
- type Object
- type Perm
- type Placeholder
- type Reader
- func (r *Reader) Authenticate(perm Perm) error
- func (r *Reader) AuthenticateOwner() error
- func (r *Reader) Close() error
- func (r *Reader) DecodeDict(s interface{}, d Dict) error
- func (r *Reader) DecodeStream(x *Stream, numFilters int) (io.Reader, error)
- func (r *Reader) Get(ref Reference) (Object, error)
- type ReaderErrorHandling
- type ReaderOptions
- type Real
- type Rectangle
- func (rect *Rectangle) Extend(other *Rectangle)
- func (rect Rectangle) IsZero() bool
- func (rect *Rectangle) NearlyEqual(other *Rectangle, eps float64) bool
- func (rect *Rectangle) PDF(w io.Writer) error
- func (rect *Rectangle) String() string
- func (rect *Rectangle) XPos(rel float64) float64
- func (rect *Rectangle) YPos(rel float64) float64
- type Reference
- type Resource
- type Resources
- type Stream
- type String
- type Version
- type VersionError
- type Writer
- func (pdf *Writer) Alloc() Reference
- func (pdf *Writer) AutoClose(res Resource)
- func (pdf *Writer) CheckVersion(operation string, minVersion Version) error
- func (pdf *Writer) Close() error
- func (pdf *Writer) NewPlaceholder(size int) *Placeholder
- func (pdf *Writer) OpenStream(ref Reference, dict Dict, filters ...*FilterInfo) (io.WriteCloser, error)
- func (pdf *Writer) Put(ref Reference, obj Object) error
- func (pdf *Writer) SetInfo(info *Info)
- func (pdf *Writer) WriteCompressed(refs []Reference, objects ...Object) error
- type WriterOptions
Constants ¶
const ( // ErrorHandlingRecover means that the reader will try to recover from // errors and continue parsing the file. This is the default. ErrorHandlingRecover = iota // ErrorHandlingReport means that the reader will try to recover from // errors and continue parsing the file, but will report errors to the // caller. ErrorHandlingReport // ErrorHandlingStop means that the reader will stop parsing the file as // soon as an error is encountered. ErrorHandlingStop )
Variables ¶
var ( GetArray = resolveAndCast[Array] GetBool = resolveAndCast[Bool] GetDict = resolveAndCast[Dict] GetInt = resolveAndCast[Integer] GetName = resolveAndCast[Name] GetReal = resolveAndCast[Real] GetStream = resolveAndCast[*Stream] GetString = resolveAndCast[String] )
Helper functions for getting objects of a specific type. Each of these functions calls Resolve on the object before attempting to convert it to the desired type. If the object is `null`, a zero object is returned witout error. If the object is of the wrong type, an error is returned.
var (
ErrNoPDF = errors.New("PDF header not found")
)
Functions ¶
This section is empty.
Types ¶
type AuthenticationError ¶
type AuthenticationError struct {
ID []byte
}
AuthenticationError indicates that authentication failed because the correct password has not been supplied.
func (*AuthenticationError) Error ¶
func (err *AuthenticationError) Error() string
type Catalog ¶
type Catalog struct { Version Version `pdf:"optional"` Extensions Object `pdf:"optional"` Pages Reference PageLabels Object `pdf:"optional"` Names Object `pdf:"optional"` Dests Object `pdf:"optional"` ViewerPreferences Object `pdf:"optional"` PageLayout Name `pdf:"optional"` PageMode Name `pdf:"optional"` Outlines Reference `pdf:"optional"` Threads Reference `pdf:"optional"` OpenAction Object `pdf:"optional"` AA Object `pdf:"optional"` URI Object `pdf:"optional"` AcroForm Object `pdf:"optional"` MetaData Reference `pdf:"optional"` StructTreeRoot Object `pdf:"optional"` MarkInfo Object `pdf:"optional"` Lang language.Tag `pdf:"optional"` SpiderInfo Object `pdf:"optional"` OutputIntents Object `pdf:"optional"` PieceInfo Object `pdf:"optional"` OCProperties Object `pdf:"optional"` Perms Object `pdf:"optional"` Legal Object `pdf:"optional"` Requirements Object `pdf:"optional"` Collection Object `pdf:"optional"` NeedsRendering bool `pdf:"optional"` // contains filtered or unexported fields }
Catalog represents a PDF Document Catalog. The only required field in this structure is Pages, which specifies the root of the page tree.
The Document Catalog is documented in section 7.7.2 of PDF 32000-1:2008.
type Data ¶ added in v0.3.0
type Data struct { Version Version Catalog *Catalog Info *Info ID [][]byte Objects map[Reference]Object }
Data is an in-memory representation of a PDF document.
func Read ¶ added in v0.3.0
func Read(r io.ReadSeeker, opt *ReaderOptions) (*Data, error)
Read reads a complete PDF document into memory.
type Dict ¶
Dict represent a Dictionary object in a PDF file.
type FileInfo ¶ added in v0.2.0
type FileInfo struct { R io.ReadSeeker FileSize int64 PDFStart int64 PDFEnd int64 HeaderVersion string Sections []*FileSection }
func SequentialScan ¶ added in v0.2.0
func SequentialScan(r io.ReadSeeker) (*FileInfo, error)
SequentialScan reads a PDF file sequentially, extracting information about the file structure and the location of indirect objects. This can be used to attempt to read damaged PDF files, in particular in cases where the cross-reference table is missing or corrupt.
func (*FileInfo) MakeReader ¶ added in v0.2.0
func (fi *FileInfo) MakeReader(opt *ReaderOptions) (*Reader, error)
type FileObject ¶ added in v0.2.0
type FileSection ¶ added in v0.2.0
type FileSection struct { XRefPos int64 TrailerPos int64 StartXRefPos int64 EOFPos int64 Objects []*FileObject Catalog *FileObject ObjectStreams []*FileObject }
TODO(voss): add start and end offsets
type FilterInfo ¶
FilterInfo describes a single PDF stream filter.
type Info ¶
type Info struct { Title string `pdf:"text string,optional"` Author string `pdf:"text string,optional"` Subject string `pdf:"text string,optional"` Keywords string `pdf:"text string,optional"` // Creator gives the name of the application that created the original // document, if the document was converted to PDF from another format. Creator string `pdf:"text string,optional"` // Producer gives the name of the application that converted the document, // if the document was converted to PDF from another format. Producer string `pdf:"text string,optional"` // CreationDate gives the date and time the document was created. CreationDate time.Time `pdf:"optional"` // ModDate gives the date and time the document was most recently modified. ModDate time.Time `pdf:"optional"` // Trapped indicates whether the document has been modified to include // trapping information. (A trap is an overlap between adjacent areas of // of different colours, used to avoid visual problems caused by imprecise // alignment of different layers of ink.) Possible values are: // * "True": The document has been fully trapped. No further trapping is // necessary. // * "False": The document has not been trapped. // * "Unknown" (default): Either it is unknown whether the document has // been trapped, or the document has been partially trapped. Further // trapping may be necessary. Trapped Name `pdf:"optional,allowstring"` // Custom contains all non-standard fields in the Info dictionary. Custom map[string]string `pdf:"extra"` }
Info represents a PDF Document Information Dictionary. All fields in this structure are optional.
The Document Information Dictionary is documented in section 14.3.3 of PDF 32000-1:2008.
type MalformedFileError ¶
MalformedFileError indicates that the PDF file could not be parsed.
func (*MalformedFileError) Error ¶
func (err *MalformedFileError) Error() string
func (*MalformedFileError) Unwrap ¶
func (err *MalformedFileError) Unwrap() error
type Name ¶
type Name string
Name represents a name object in a PDF file.
type Object ¶
type Object interface { // PDF writes the PDF file representation of the object to w. PDF(w io.Writer) error }
Object represents an object in a PDF file. There are nine native types of PDF objects, which implement this interface: Array, Bool, Dict, Integer, Name, Real, Reference, *Stream, and String. Custom types can be constructed out of these basic types, by implementing the Object interface.
func Resolve ¶ added in v0.3.1
Resolve resolves references to indirect objects.
If obj is a Reference, the function reads the corresponding object from the file and returns the result. If obj is not a Reference, it is returned unchanged. The function recursively follows chains of references until it resolves to a non-reference object.
If a reference loop is encountered, the function returns an error of type MalformedFileError.
type Perm ¶
type Perm int
Perm describes which operations are permitted when accessing the document with User access (but not Owner access). The user can always view the document.
This library just reports the permissions as specified in the PDF file. It is up to the caller to enforce the permissions.
const ( // PermCopy allows to extract text and graphics. PermCopy Perm = 1 << iota // PermPrintDegraded allows printing of a low-level representation of the // appearance, possibly of degraded quality. PermPrintDegraded // PermPrint allows printing a representation from which a faithful digital // copy of the PDF content could be generated. This implies // PermPrintDegraded. PermPrint // PermForms allows to fill in form fields, including signature fields. PermForms // PermAnnotate allows to add or modify text annotations. This implies // PermForms. PermAnnotate // PermAssemble allows to insert, rotate, or delete pages and to create // bookmarks or thumbnail images. PermAssemble // PermModify allows to modify the document. This implies PermAssemble. PermModify // PermAll gives the user all permissions, making User access equivalent to // Owner access. PermAll = permNext - 1 )
type Placeholder ¶
type Placeholder struct {
// contains filtered or unexported fields
}
A Placeholder is a space reserved in a PDF file that can later be filled with a value. One common use case is to store the length of compressed content in a PDF stream dictionary. To create Placeholder objects, use the Writer.NewPlaceholder method.
func (*Placeholder) PDF ¶
func (x *Placeholder) PDF(w io.Writer) error
PDF implements the Object interface.
func (*Placeholder) Set ¶
func (x *Placeholder) Set(val Object) error
Set fills in the value of the placeholder object. This should be called as soon as possible after the value becomes known.
type Reader ¶
type Reader struct { // Version is the PDF version used in this file. This is specified in // the initial comment at the start of the file, and may be overridden by // the /Version entry in the document catalog. Version Version // The ID of the file. This is either a slice of two byte slices (the // original ID of the file, and the ID of the current version), or nil if // the file does not specify an ID. ID [][]byte // Catalog is the document catalog for this file. Catalog *Catalog // Info is the document information dictionary for this file. // This is nil if the file does not contain a document information // dictionary. Info *Info // Errors is a list of errors encountered while opening the file. // This is only used if the ErrorHandling option is set to // ErrorHandlingReport. Errors []*MalformedFileError // contains filtered or unexported fields }
Reader represents a pdf file opened for reading. Use Open or NewReader to create a Reader.
func NewReader ¶
func NewReader(data io.ReadSeeker, opt *ReaderOptions) (*Reader, error)
NewReader creates a new Reader object.
func Open ¶
Open opens the named PDF file for reading. After use, Reader.Close must be called to close the file the Reader is reading from.
func (*Reader) Authenticate ¶ added in v0.3.1
AuthenticateOwner tries to authentica the actions given in perm. If a password is required, this calls the ReadPassword function specified in the ReaderOptions struct. The return value is nil if the owner was authenticated (or if no authentication is required), and an object of type AuthenticationError if the required password was not supplied.
func (*Reader) AuthenticateOwner ¶
AuthenticateOwner tries to authenticate the owner of a document. If a password is required, this calls the ReadPassword function specified in the ReaderOptions struct. The return value is nil if the owner was authenticated (or if no authentication is required), and an object of type AuthenticationError if the required password was not supplied.
func (*Reader) Close ¶
Close closes the file underlying the reader. This call only has an effect if the Reader was created by Open.
func (*Reader) DecodeDict ¶ added in v0.3.0
DecodeDict initialises a tagged struct using the data from a PDF dictionary. The argument s must be a pointer to a struct, or the function will panic.
func (*Reader) DecodeStream ¶ added in v0.2.0
DecodeStream returns a reader for the decoded stream data. If numFilters is non-zero, only the first numFilters filters are decoded.
type ReaderErrorHandling ¶ added in v0.3.1
type ReaderErrorHandling int
type ReaderOptions ¶ added in v0.2.0
type ReaderOptions struct { // ReadPassword is a function that queries the user for a password for the // document with the given ID. The function is called repeatedly, with // sequentially increasing values of try (starting at 0), until the correct // password is entered. If the function returns the empty string, the // authentication attempt is aborted and an [AuthenticationError] is // reported to the caller. ReadPassword func(ID []byte, try int) string ErrorHandling ReaderErrorHandling }
type Rectangle ¶
type Rectangle struct {
LLx, LLy, URx, URy float64
}
Rectangle represents a PDF rectangle.
func GetRectangle ¶ added in v0.3.1
GetRectangle resolves references to indirect objects and makes sure the resulting object is a PDF rectangle object. If the object is null, nil is returned.
func (*Rectangle) NearlyEqual ¶
NearlyEqual reports whether the corner coordinates of two rectangles differ by less than `eps`.
type Reference ¶
type Reference uint64
Reference represents a reference to an indirect object in a PDF file. The lower 32 bits represent the object number, the next 16 bits the generation number.
func NewReference ¶ added in v0.2.0
func (Reference) Generation ¶
type Resources ¶
type Resources struct { ExtGState Dict `pdf:"optional"` // maps resource names to graphics state parameter dictionaries ColorSpace Dict `pdf:"optional"` // maps each resource name to either the name of a device-dependent colour space or an array describing a colour space Pattern Dict `pdf:"optional"` // maps resource names to pattern objects Shading Dict `pdf:"optional"` // maps resource names to shading dictionaries XObject Dict `pdf:"optional"` // maps resource names to external objects Font Dict `pdf:"optional"` // maps resource names to font dictionaries ProcSet Array `pdf:"optional"` // predefined procedure set names Properties Dict `pdf:"optional"` // maps resource names to property list dictionaries for marked content }
Resources describes a PDF Resource Dictionary. See section 7.8.3 of PDF 32000-1:2008 for details. TODO(voss): use []*font.Font for the .Font field?
type Stream ¶
Stream represent a stream object in a PDF file.
type String ¶
type String []byte
String represents a raw string in a PDF file. The character set encoding, if any, is determined by the context.
func ParseString ¶
ParseString parses a string from the given buffer. The buffer must include the surrounding parentheses or angle brackets.
func TextString ¶
TextString creates a String object using the "text string" encoding, i.e. using either UTF-16BE encoding (with a BOM) or PdfDocEncoding.
func (String) AsDate ¶
AsDate converts a PDF date string to a time.Time object. If the string does not have the correct format, an error is returned.
func (String) AsTextString ¶
AsTextString interprets x as a PDF "text string" and returns the corresponding utf-8 encoded string.
type Version ¶
type Version int
Version represent the version of PDF standard used in a file.
const ( V1_0 Version V1_1 V1_2 V1_3 V1_4 V1_5 V1_6 V1_7 V2_0 )
PDF versions supported by this library.
func ParseVersion ¶
ParseVersion parses a PDF version string.
type VersionError ¶
VersionError is returned when trying to use a feature in a PDF file which is not supported by the PDF version used. Use Writer.CheckVersion to create VersionError objects.
func (*VersionError) Error ¶
func (err *VersionError) Error() string
type Writer ¶
type Writer struct { // Version is the PDF version used in this file. This field is // read-only. Use the opt argument of NewWriter to set the PDF version for // a new file. Version Version // The Document Catalog is documented in section 7.7.2 of PDF 32000-1:2008. Catalog *Catalog Tagged bool Resources map[Reference]Resource // contains filtered or unexported fields }
Writer represents a PDF file open for writing. Use the functions Create or NewWriter to create a new Writer.
func Create ¶
Create creates the named PDF file and opens it for output. If a previous file with the same name exists, it is overwritten. After writing is complete, Writer.Close must be called to write the trailer and to close the underlying file.
If non-default settings are required, NewWriter can be used to set options.
func NewWriter ¶
func NewWriter(w io.Writer, opt *WriterOptions) (*Writer, error)
NewWriter prepares a PDF file for writing.
The Writer.Close method must be called after the file contents have been written, to add the trailer and the cross reference table to the PDF file. It is the callers responsibility, to close the writer w after the pdf.Writer has been closed.
func (*Writer) CheckVersion ¶
CheckVersion checks whether the PDF file being written has version minVersion or later. If the version is new enough, nil is returned. Otherwise a VersionError for the given operation is returned.
func (*Writer) Close ¶
Close closes the Writer, flushing any unwritten data to the underlying io.Writer.
func (*Writer) NewPlaceholder ¶
func (pdf *Writer) NewPlaceholder(size int) *Placeholder
NewPlaceholder creates a new placeholder for a value which is not yet known. The argument size must be an upper bound to the length of the replacement text. Once the value becomes known, it can be filled in using the Placeholder.Set method.
func (*Writer) OpenStream ¶
func (pdf *Writer) OpenStream(ref Reference, dict Dict, filters ...*FilterInfo) (io.WriteCloser, error)
OpenStream adds a PDF Stream to the file and returns an io.Writer which can be used to add the stream's data. No other objects can be added to the file until the stream is closed.
func (*Writer) Put ¶ added in v0.3.0
Put writes an object to the PDF file, as an indirect object using the given reference.
func (*Writer) WriteCompressed ¶
WriteCompressed writes a number of objects to the file as a compressed object stream.
Object streams are only available for PDF version 1.5 and newer; in case the file version is too low, the objects are written directly into the PDF file, without compression.
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
Package color implements different PDF color spaces.
|
Package color implements different PDF color spaces. |
demo
|
|
cff-glyphs
Read a CFF font and display a magnified version of each glyph in a PDF file.
|
Read a CFF font and display a magnified version of each glyph in a PDF file. |
Package font implements the PDF font handling.
|
Package font implements the PDF font handling. |
builtin
Package builtin implements support for the 14 built-in PDF fonts.
|
Package builtin implements support for the 14 built-in PDF fonts. |
cid
Package cid provides support for embedding CID fonts into PDF documents.
|
Package cid provides support for embedding CID fonts into PDF documents. |
simple
Package simple provides support for embedding simple fonts into PDF documents.
|
Package simple provides support for embedding simple fonts into PDF documents. |
tounicode
Package tounicode reads and writes PDF /ToUnicode CMaps.
|
Package tounicode reads and writes PDF /ToUnicode CMaps. |
type3
Package type3 provides support for embedding type 3 fonts into PDF documents.
|
Package type3 provides support for embedding type 3 fonts into PDF documents. |
Package graphics allows to draw on a PDF page.
|
Package graphics allows to draw on a PDF page. |
Package image provides functions for embedding images in PDF files.
|
Package image provides functions for embedding images in PDF files. |
internal
|
|
Package lzw implements the Lempel-Ziv-Welch compressed data format.
|
Package lzw implements the Lempel-Ziv-Welch compressed data format. |
Package pagetree implements PDF page trees.
|
Package pagetree implements PDF page trees. |