Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ( TextExtensions = []string{ ".txt", ".md", ".yml", ".yaml", ".html", ".json", ".csv", ".xml", } ImageExtensions = []string{ ".png", ".jpg", ".jpeg", ".gif", ".bmp", } DocumentExtensions = []string{ ".pdf", ".doc", ".docx", } )
Common file extensions
Functions ¶
This section is empty.
Types ¶
type Handler ¶
type Handler struct {
// contains filtered or unexported fields
}
Handler processes input files and directories
func (*Handler) GetAllContents ¶
GetAllContents returns all file contents concatenated
func (*Handler) GetFileContents ¶
GetFileContents returns the contents of a specific file
func (*Handler) ProcessPath ¶
ProcessPath handles both file and directory inputs
type Input ¶
type Input struct { Path string Type InputType Contents []byte Metadata map[string]interface{} // For additional data like scraping config ScrapeConfig *ScrapeConfig // Specific configuration for web scraping MimeType string // Added MimeType field }
Input represents a file or directory to be processed
type ScrapeConfig ¶
type ScrapeConfig struct { URL string `yaml:"url"` AllowedDomains []string `yaml:"allowed_domains"` Headers map[string]string `yaml:"headers"` Extract []string `yaml:"extract"` }
ScrapeConfig represents the configuration for web scraping
type Validator ¶
type Validator struct {
// contains filtered or unexported fields
}
Validator validates input paths
func NewValidator ¶
NewValidator creates a new input validator with default text extensions
func (*Validator) IsDocumentFile ¶
IsDocumentFile checks if the file has a document extension
func (*Validator) IsImageFile ¶
IsImageFile checks if the file has an image extension
func (*Validator) ValidateFileExtension ¶
ValidateFileExtension checks if the file has an allowed extension
func (*Validator) ValidatePath ¶
ValidatePath checks if the path is valid