Documentation ¶
Overview ¶
Package documentloaders includes a standard interface for loading documents from a source and implementations of this interface.
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ErrMissingAudioSource = errors.New("assemblyai: missing audio source")
ErrMissingAudioSource is returned when neither an audio URL nor a reader has been set using WithAudioURL or WithAudioReader.
Functions ¶
This section is empty.
Types ¶
type AssemblyAIAudioTranscriptLoader ¶
type AssemblyAIAudioTranscriptLoader struct {
// contains filtered or unexported fields
}
AssemblyAIAudioTranscriptLoader transcribes an audio file using AssemblyAI and loads the transcript.
Audio files can be specified using either a URL or a reader.
For a list of the supported audio and video formats, see the FAQ.
func NewAssemblyAIAudioTranscript ¶
func NewAssemblyAIAudioTranscript(apiKey string, opts ...AssemblyAIOption) *AssemblyAIAudioTranscriptLoader
NewAssemblyAIAudioTranscript returns a new instance AssemblyAIAudioTranscriptLoader.
func (*AssemblyAIAudioTranscriptLoader) Load ¶
Load transcribes an audio file, transcribes it using AssemblyAI, and returns them transcript as a document.
func (*AssemblyAIAudioTranscriptLoader) LoadAndSplit ¶
func (a *AssemblyAIAudioTranscriptLoader) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)
LoadAndSplit transcribes the audio data and splits it into multiple documents using a text splitter.
type AssemblyAIOption ¶
type AssemblyAIOption func(loader *AssemblyAIAudioTranscriptLoader)
AssemblyAIOption is an option for the AssemblyAI loader.
func WithAudioReader ¶
func WithAudioReader(r io.Reader) AssemblyAIOption
WithAudioReader configures the loader to transcribe a local audio file.
func WithAudioURL ¶
func WithAudioURL(url string) AssemblyAIOption
WithAudioURL configures the loader to transcribe an audio file from a URL. The URL needs to be accessible from AssemblyAI's servers.
func WithTranscriptFormat ¶
func WithTranscriptFormat(format TranscriptFormat) AssemblyAIOption
WithAudioReader configures the format of the document page content.
func WithTranscriptParams ¶
func WithTranscriptParams(params *assemblyai.TranscriptOptionalParams) AssemblyAIOption
WithTranscriptParams configures the optional parameters for the transcription.
type CSV ¶
type CSV struct {
// contains filtered or unexported fields
}
CSV represents a CSV document loader.
func NewCSV ¶
NewCSV creates a new csv loader with an io.Reader and optional column names for filtering.
func (CSV) LoadAndSplit ¶
func (c CSV) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)
LoadAndSplit reads text data from the io.Reader and splits it into multiple documents using a text splitter.
type HTML ¶
type HTML struct {
// contains filtered or unexported fields
}
HTML loads parses and sanitizes html content from an io.Reader.
func (HTML) LoadAndSplit ¶
func (h HTML) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)
LoadAndSplit reads text data from the io.Reader and splits it into multiple documents using a text splitter.
type Loader ¶
type Loader interface { // Load loads from a source and returns documents. Load(ctx context.Context) ([]schema.Document, error) // LoadAndSplit loads from a source and splits the documents using a text splitter. LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error) }
Loader is the interface for loading and splitting documents from a source.
type NotionDirectoryLoader ¶
type NotionDirectoryLoader struct {
// contains filtered or unexported fields
}
NotionDirectoryLoader is a document loader that reads content from pages within a Notion Database.
func NewNotionDirectory ¶
func NewNotionDirectory(filePath string, encoding ...string) *NotionDirectoryLoader
NewNotionDirectory creates a new NotionDirectoryLoader with the given file path and encoding.
type PDF ¶
type PDF struct {
// contains filtered or unexported fields
}
PDF loads text data from an io.Reader.
func NewPDF ¶
func NewPDF(r io.ReaderAt, size int64, opts ...PDFOptions) PDF
NewPDF creates a new text loader with an io.Reader.
func (PDF) Load ¶
Load reads from the io.Reader for the PDF data and returns the documents with the data and with metadata attached of the page number and total number of pages of the PDF.
func (PDF) LoadAndSplit ¶
func (p PDF) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)
LoadAndSplit reads pdf data from the io.Reader and splits it into multiple documents using a text splitter.
type PDFOptions ¶
type PDFOptions func(pdf *PDF)
PDFOptions are options for the PDF loader.
func WithPassword ¶
func WithPassword(password string) PDFOptions
WithPassword sets the password for the PDF.
type Text ¶
type Text struct {
// contains filtered or unexported fields
}
Text loads text data from an io.Reader.
func (Text) LoadAndSplit ¶
func (l Text) LoadAndSplit(ctx context.Context, splitter textsplitter.TextSplitter) ([]schema.Document, error)
LoadAndSplit reads text data from the io.Reader and splits it into multiple documents using a text splitter.
type TranscriptFormat ¶
type TranscriptFormat int
TranscriptFormat represents the format of the document page content.
const ( // Single document with full transcript text. TranscriptFormatText TranscriptFormat = iota // Multiple documents with each sentence as page content. TranscriptFormatSentences // Multiple documents with each paragraph as page content. TranscriptFormatParagraphs // Single document with SRT formatted subtitles as page content. TranscriptFormatSubtitlesSRT // Single document with VTT formatted subtitles as page content. TranscriptFormatSubtitlesVTT )