convert

package
v0.6.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 23, 2024 License: AGPL-3.0 Imports: 12 Imported by: 0

Documentation

Overview

Package convert provides utilities to convert various document formats (PDF, DOCX, HTML) into plain text format. It exposes functions to process and extract textual content from these document types.

Overview

The `convert` package is designed to convert a variety of document formats into plain text. It supports the following formats:

  • PDF: Extracts text from PDF files using the `github.com/ledongthuc/pdf` library.
  • DOCX: Converts DOCX files into plain text using the `github.com/fumiama/go-docx` library.
  • HTML: Strips HTML tags and extracts textual content using the `jaytaylor.com/html2text` package.

Exported Functions

Convert: Converts all supported document files from the input directory to plain text files based on the configuration settings.

Example:

> err := convert.Convert(config)
> if err != nil {
>     log.Fatalf("Conversion failed: %v", err)
> }

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Convert

func Convert(inputDir, selectedFormats string) error

Convert processes files from the input directory specified in the configuration and converts them into plain text files.

It reads the configuration settings to identify supported formats and input directory paths. The function attempts to convert each file into a .txt file based on its format.

Parameters:

  • config: A pointer to a config.Config instance containing configuration details.

Returns:

  • An error if any issue occurs during reading, processing, or writing the files.

Example:

> err := convert.Convert(config)
> if err != nil {
>     log.Fatalf("Conversion failed: %v", err)
> }

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL