documentloader

package
v0.2.0-rc.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 24, 2024 License: Apache-2.0 Imports: 27 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func AsLangchain

func AsLangchain(loader types.DocumentLoader) lcgodocloaders.Loader

func DefaultDocLoaderFunc

func DefaultDocLoaderFunc(filetype string) func(ctx context.Context, reader io.Reader) ([]vs.Document, error)

func FromLangchain

func FromLangchain(loader lcgodocloaders.Loader) types.DocumentLoader

func GetDocumentLoaderConfig

func GetDocumentLoaderConfig(name string) (any, error)

func WithConfig

func WithConfig(config PDFOptions) func(o *PDFOptions)

WithConfig sets the PDF loader configuration.

Types

type LoaderFunc

type LoaderFunc func(ctx context.Context, reader io.Reader) ([]vs.Document, error)

func GetDocumentLoaderFunc

func GetDocumentLoaderFunc(name string, config any) (LoaderFunc, error)

type PDF

type PDF struct {
	// contains filtered or unexported fields
}

PDF represents a PDF document loader that implements the DocumentLoader interface.

func NewPDF

func NewPDF(r io.Reader, optFns ...func(o *PDFOptions)) (*PDF, error)

NewPDFFromFile creates a new PDF loader with the given options.

func (*PDF) Load

func (l *PDF) Load(ctx context.Context) ([]vs.Document, error)

Load loads the PDF document and returns a slice of vs.Document containing the page contents and metadata.

func (*PDF) LoadAndSplit

func (l *PDF) LoadAndSplit(ctx context.Context, splitter types.TextSplitter) ([]vs.Document, error)

LoadAndSplit loads PDF documents from the provided reader and splits them using the specified text splitter.

type PDFOptions

type PDFOptions struct {
	// Password for encrypted PDF files.
	Password string

	// Page number to start loading from (default is 1).
	StartPage uint

	// Maximum number of pages to load (0 for all pages).
	MaxPages uint

	// Source is the name of the pdf document
	Source string

	// Number of goroutines to load pdf documents
	NumThread int
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL