docxtotext

package
v1.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 11, 2023 License: MIT Imports: 11 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type DocxParser

type DocxParser struct {
	// contains filtered or unexported fields
}

DocxParser represents the XML file structure and settings for parsing a docx file.

func Open

func Open(path string) (*DocxParser, error)

Open opens the specified docx file path and returns a new DocxParser instance and an error, if any.

Parameters:

  • path: a string representing the path to the docx file.

Returns:

  • *DocxParser: a pointer to the DocxParser struct.
  • error: an error, if any.

func OpenReader

func OpenReader(r io.ReaderAt, n int64) (*DocxParser, error)

OpenReader opens a DocxParser for the given io.ReaderAt and file size.

Parameters:

  • r: The io.ReaderAt to read the docx file from.
  • n: The size of the docx file.

Returns:

  • *DocxParser: The opened DocxParser object.
  • error: Any error that occurred during the opening process.

func OpenURL

func OpenURL(u string) (*DocxParser, int, error)

OpenURL opens the specified docx file URL and returns a DocxParser, status code, and error.

Parameters:

  • u (string): The URL to open.

Returns:

  • *DocxParser: A pointer to a DocxParser.
  • int: The status code.
  • error: An error object.

func (*DocxParser) Close

func (dp *DocxParser) Close() (err error)

Close closes the zipReader and OCR client. After extracting the text, please remember to call this method.

func (*DocxParser) DisableLogging

func (dp *DocxParser) DisableLogging(v bool)

DisableLogging disables logging.

func (*DocxParser) ExtractImages

func (dp *DocxParser) ExtractImages() ([]types.Image, error)

ExtractImages extracts images from the docx file.

Parameters:

  • None

Returns:

  • []types.Image: a slice of images extracted from the docx file.
  • error: an error if any occurred during the extraction process.

func (*DocxParser) ExtractTexts

func (dp *DocxParser) ExtractTexts() (string, error)

ExtractTexts extracts the texts from the docx file.

Parameters:

  • None

Returns:

  • string: The extracted texts.
  • error: An error if any.

func (*DocxParser) SetDrawingsNoFmt

func (dp *DocxParser) SetDrawingsNoFmt(v bool)

SetDrawingsNoFmt sets drawings text no outline format.

func (*DocxParser) SetOcrInterface

func (dp *DocxParser) SetOcrInterface(ocr types.OCR)

SetOcrInterface overrides default ocr interface.

func (*DocxParser) SetParagraphSep

func (dp *DocxParser) SetParagraphSep(sep string)

SetParagraphSep sets paragraph separator. Default is "\n".

func (*DocxParser) SetParseCharts

func (dp *DocxParser) SetParseCharts(v bool)

SetParseCharts parses charts or not. Default is false.

func (*DocxParser) SetParseComments

func (dp *DocxParser) SetParseComments(v bool)

SetParseComments parses comments or not. Default is true.

func (*DocxParser) SetParseDiagrams

func (dp *DocxParser) SetParseDiagrams(v bool)

SetParseDiagrams parses diagrams or not. Default is false.

func (*DocxParser) SetParseEndnotes

func (dp *DocxParser) SetParseEndnotes(v bool)

SetParseEndnotes parses endnotes or not. Default is true.

func (*DocxParser) SetParseFooters

func (dp *DocxParser) SetParseFooters(v bool)

SetParseFooters parses footers or not. Default is true.

func (*DocxParser) SetParseFootnotes

func (dp *DocxParser) SetParseFootnotes(v bool)

SetParseFootnotes parses footnotes or not. Default is true.

func (*DocxParser) SetParseHeaders

func (dp *DocxParser) SetParseHeaders(v bool)

SetParseHeaders parses headers or not. Default is true.

func (*DocxParser) SetParseImages

func (dp *DocxParser) SetParseImages(v bool)

SetParseImages parses images or not. Default is false. When ocr interface is not set, default tesseract-ocr will be used.

func (*DocxParser) SetPartSep

func (dp *DocxParser) SetPartSep(sep string)

SetPartSep sets document part(every XML file like header, footer, etc.) separator. Default is "-"x100.

func (*DocxParser) SetTableColSep

func (dp *DocxParser) SetTableColSep(sep string)

SetTableColSep sets table column separator. Default is "\t".

func (*DocxParser) SetTableRowSep

func (dp *DocxParser) SetTableRowSep(sep string)

SetTableRowSep sets table row separator. Default is "\n".

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL