documentloaders

package
v0.1.2-pre Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 28, 2023 License: MIT Imports: 15 Imported by: 40

Documentation

Overview

Package documentloaders includes a standard interface for loading documents from a source and implementations of this interface.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CSV

type CSV struct {
	// contains filtered or unexported fields
}

CSV represents a CSV document loader.

func NewCSV

func NewCSV(r io.Reader, columns ...string) CSV

NewCSV creates a new csv loader with an io.Reader and optional column names for filtering.

func (CSV) Load

func (c CSV) Load(_ context.Context) ([]schema.Document, error)

Load reads from the io.Reader and returns a single document with the data.

func (CSV) LoadAndSplit

func (c CSV) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)

LoadAndSplit reads text data from the io.Reader and splits it into multiple documents using a text splitter.

type HTML

type HTML struct {
	// contains filtered or unexported fields
}

HTML loads parses and sanitizes html content from an io.Reader.

func NewHTML

func NewHTML(r io.Reader) HTML

NewHTML creates a new html loader with an io.Reader.

func (HTML) Load

func (h HTML) Load(_ context.Context) ([]schema.Document, error)

Load reads from the io.Reader and returns a single document with the data.

func (HTML) LoadAndSplit

func (h HTML) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)

LoadAndSplit reads text data from the io.Reader and splits it into multiple documents using a text splitter.

type Loader

type Loader interface {
	// Load loads from a source and returns documents.
	Load(ctx context.Context) ([]schema.Document, error)
	// LoadAndSplit loads from a source and splits the documents using a text splitter.
	LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)
}

Loader is the interface for loading and splitting documents from a source.

type NotionDirectoryLoader

type NotionDirectoryLoader struct {
	// contains filtered or unexported fields
}

NotionDirectoryLoader is a document loader that reads content from pages within a Notion Database.

func NewNotionDirectory

func NewNotionDirectory(filePath string, encoding ...string) *NotionDirectoryLoader

NewNotionDirectory creates a new NotionDirectoryLoader with the given file path and encoding.

func (*NotionDirectoryLoader) Load

func (n *NotionDirectoryLoader) Load() ([]schema.Document, error)

Load retrieves data from a Notion directory and returns a list of schema.Document objects.

type PDF

type PDF struct {
	// contains filtered or unexported fields
}

PDF loads text data from an io.Reader.

func NewPDF

func NewPDF(r io.ReaderAt, size int64, opts ...PDFOptions) PDF

NewPDF creates a new text loader with an io.Reader.

func (PDF) Load

func (p PDF) Load(_ context.Context) ([]schema.Document, error)

Load reads from the io.Reader for the PDF data and returns the documents with the data and with metadata attached of the page number and total number of pages of the PDF.

func (PDF) LoadAndSplit

func (p PDF) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)

LoadAndSplit reads pdf data from the io.Reader and splits it into multiple documents using a text splitter.

type PDFOptions

type PDFOptions func(pdf *PDF)

PDFOptions are options for the PDF loader.

func WithPassword

func WithPassword(password string) PDFOptions

WithPassword sets the password for the PDF.

type Text

type Text struct {
	// contains filtered or unexported fields
}

Text loads text data from an io.Reader.

func NewText

func NewText(r io.Reader) Text

NewText creates a new text loader with an io.Reader.

func (Text) Load

func (l Text) Load(_ context.Context) ([]schema.Document, error)

Load reads from the io.Reader and returns a single document with the data.

func (Text) LoadAndSplit

func (l Text) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)

LoadAndSplit reads text data from the io.Reader and splits it into multiple documents using a text splitter.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL