Documentation
¶
Index ¶
Constants ¶
const (
MetaKeySource = "_source"
)
Variables ¶
This section is empty.
Functions ¶
func GetImplSpecificOptions ¶
GetImplSpecificOptions provides Parser author the ability to extract their own custom options from the unified Option type. T: the type of the impl specific options struct. This function should be used within the Parser implementation's Transform function. It is recommended to provide a base T as the first argument, within which the Parser author can provide default values for the impl specific options.
Types ¶
type ExtParser ¶
type ExtParser struct {
// contains filtered or unexported fields
}
ExtParser is a parser that uses the file extension to determine which parser to use. You can register your own parsers by calling RegisterParser. Default parser is TextParser. Note:
parse 时,是通过 filepath.Ext(uri) 的方式找到对应的 parser,因此使用时需要: ① 必须使用 parser.WithURI 在请求时传入 URI ② URI 必须能通过 filepath.Ext 来解析出符合预期的 ext
eg:
pdf, _ := os.Open("./testdata/test.pdf") docs, err := ExtParser.Parse(ctx, pdf, parser.WithURI("./testdata/test.pdf"))
func NewExtParser ¶
func NewExtParser(ctx context.Context, conf *ExtParserConfig) (*ExtParser, error)
NewExtParser creates a new ExtParser.
func (*ExtParser) GetParsers ¶
GetParsers returns a copy of the registered parsers. It is safe to modify the returned parsers.
type ExtParserConfig ¶
type ExtParserConfig struct { // ext -> parser. // eg: map[string]Parser{ // ".pdf": &PDFParser{}, // ".md": &MarkdownParser{}, // } Parsers map[string]Parser // Fallback parser to use when no other parser is found. // Default is TextParser if not set. FallbackParser Parser }
ExtParserConfig defines the configuration for the ExtParser.
type Option ¶
type Option struct {
// contains filtered or unexported fields
}
Option defines call option for Parser component, which is part of the component interface signature. Each Parser implementation could define its own options struct and option funcs within its own package, then wrap the impl specific option funcs into this type, before passing to Transform.
func WithExtraMeta ¶
WithExtraMeta specifies the extra meta data of the document.
func WithURI ¶
WithURI specifies the URI of the document. It will be used as to select parser in ExtParser.
func WrapImplSpecificOptFn ¶
WrapImplSpecificOptFn wraps the impl specific option functions into Option type. T: the type of the impl specific options struct. Parser implementations are required to use this function to convert its own option functions into the unified Option type. For example, if the Parser impl defines its own options struct:
type customOptions struct { conf string }
Then the impl needs to provide an option function as such:
func WithConf(conf string) Option { return WrapImplSpecificOptFn(func(o *customOptions) { o.conf = conf } }
.
type Options ¶
type Options struct { // uri of source. URI string // extra metadata will merge to each document. ExtraMeta map[string]any }
func GetCommonOptions ¶
GetCommonOptions extract parser Options from Option list, optionally providing a base Options with default values.
type Parser ¶
type Parser interface {
Parse(ctx context.Context, reader io.Reader, opts ...Option) ([]*schema.Document, error)
}
Parser is a document parser, can be used to parse a document from a reader.
type TextParser ¶
type TextParser struct{}
TextParser is a simple parser that reads the text from a reader and returns a single document. eg:
docs, err := TextParser.Parse(ctx, strings.NewReader("hello world")) fmt.Println(docs[0].Content) // "hello world"