Documentation ¶
Overview ¶
Package convert provides utilities to convert various document formats (PDF, DOCX, HTML) into plain text format. It exposes functions to process and extract textual content from these document types.
Overview ¶
The `convert` package is designed to convert a variety of document formats into plain text. It supports the following formats:
- PDF: Extracts text from PDF files using the `github.com/ledongthuc/pdf` library.
- DOCX: Converts DOCX files into plain text using the `github.com/fumiama/go-docx` library.
- HTML: Strips HTML tags and extracts textual content using the `jaytaylor.com/html2text` package.
Exported Functions ¶
Convert: Converts all supported document files from the input directory to plain text files based on the configuration settings.
Example:
> err := convert.Convert(config) > if err != nil { > log.Fatalf("Conversion failed: %v", err) > }
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Convert ¶
Convert processes files from the input directory specified in the configuration and converts them into plain text files.
It reads the configuration settings to identify supported formats and input directory paths. The function attempts to convert each file into a .txt file based on its format.
Parameters:
- config: A pointer to a config.Config instance containing configuration details.
Returns:
- An error if any issue occurs during reading, processing, or writing the files.
Example:
> err := convert.Convert(config) > if err != nil { > log.Fatalf("Conversion failed: %v", err) > }
Types ¶
This section is empty.