gopdf

package
v0.4.14-rc.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 10, 2024 License: Apache-2.0 Imports: 9 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func WithConfig

func WithConfig(config PDFOptions) func(o *PDFOptions)

WithConfig sets the PDF loader configuration.

func WithInterpreterConfig

func WithInterpreterConfig(cfg pdf.InterpreterConfig) func(o *PDFOptions)

WithInterpreterConfig sets the interpreter config for the PDF loader.

func WithInterpreterOpts

func WithInterpreterOpts(opts ...pdf.InterpreterOption) func(o *PDFOptions)

WithInterpreterOpts sets the interpreter options for the PDF loader.

Types

type PDF

type PDF struct {
	// contains filtered or unexported fields
}

PDF represents a PDF document loader that implements the DocumentLoader interface.

func NewDefaultPDF

func NewDefaultPDF(f io.Reader) (*PDF, error)

func NewPDF

func NewPDF(f io.ReaderAt, size int64, optFns ...func(o *PDFOptions)) (*PDF, error)

NewPDF creates a new PDF loader with the given options.

func NewPDFFromFile

func NewPDFFromFile(f *os.File, optFns ...func(o *PDFOptions)) (*PDF, error)

NewPDFFromFile creates a new PDF loader with the given options.

func NewPDFFromReader

func NewPDFFromReader(f io.Reader, optFns ...func(o *PDFOptions)) (*PDF, error)

func (*PDF) Load

func (l *PDF) Load(ctx context.Context) ([]vs.Document, error)

Load loads the PDF document and returns a slice of vs.Document containing the page contents and metadata.

func (*PDF) LoadAndSplit

func (l *PDF) LoadAndSplit(ctx context.Context, splitter types.TextSplitter) ([]vs.Document, error)

LoadAndSplit loads PDF documents from the provided reader and splits them using the specified text splitter.

type PDFOptions

type PDFOptions struct {
	// Password for encrypted PDF files.
	Password string

	// Page number to start loading from (default is 1).
	StartPage uint

	// Maximum number of pages to load (0 for all pages).
	MaxPages uint

	// Source is the name of the pdf document
	Source string

	// InterpreterConfig is the configuration for the PDF interpreter.
	InterpreterConfig *pdf.InterpreterConfig
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL