xlsxtotext

package
v1.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 11, 2023 License: MIT Imports: 14 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type XlsxParser

type XlsxParser struct {
	// contains filtered or unexported fields
}

XlsxParser represents the XML file structure and settings for parsing a xlsx file.

func Open

func Open(path string) (*XlsxParser, error)

Open opens the specified xlsx file path and returns a new XlsxParser instance and an error, if any.

Parameters:

  • path: a string representing the path to the xlsx file.

Returns:

  • *XlsxParser: a pointer to the XlsxParser struct.
  • error: an error, if any.

func OpenReader

func OpenReader(r io.ReaderAt, n int64) (*XlsxParser, error)

OpenReader opens a DocxParser for the given io.ReaderAt and file size.

Parameters:

  • r: The io.ReaderAt to read the docx file from.
  • n: The size of the docx file.

Returns:

  • *DocxParser: The opened DocxParser object.
  • error: Any error that occurred during the opening process.

func OpenURL

func OpenURL(u string) (*XlsxParser, int, error)

OpenURL opens the specified xlsx file URL and returns a XlsxParser, status code, and error.

Parameters:

  • u (string): The URL to open.

Returns:

  • *XlsxParser: A pointer to a XlsxParser.
  • int: The status code.
  • error: An error object.

func (*XlsxParser) Close

func (xp *XlsxParser) Close() (err error)

Close closes the zipReader and OCR client. After extracting the text, please remember to call this method.

func (*XlsxParser) ExtractImages added in v1.0.1

func (xp *XlsxParser) ExtractImages() ([]types.Image, error)

ExtractImages extracts images from the xlsx file.

Parameters:

  • None

Returns:

  • []types.Image: a slice of images extracted from the xlsx file.
  • error: an error if any occurred during the extraction process.

func (*XlsxParser) ExtractSheetTexts

func (xp *XlsxParser) ExtractSheetTexts(sheets ...int) (string, error)

ExtractSheetTexts extracts the texts from the specified xlsx sheets(start 1).

It takes in one or more sheet numbers as parameters and returns a string containing the extracted texts. The function also returns an error if there is any issue with parsing the sheets.

Parameters:

  • sheets: An integer slice containing the sheet numbers to extract texts from.

Returns:

  • string: A string containing the extracted texts.
  • error: An error object if there is any issue with parsing the sheets.

func (*XlsxParser) ExtractTexts

func (xp *XlsxParser) ExtractTexts() (string, error)

ExtractTexts extracts the texts from the xlsx file.

It iterates through each sheet of the xlsx file and appends the text content to a strings.Builder object. The extracted texts are then returned as a string.

If onlySharedStrings is set to true, only shared strings will be extracted.

If there is an error encountered during the parsing of a sheet, the function returns the extracted texts up to that point, along with the error.

Returns:

  • string: The extracted texts from the xlsx file.
  • error: An error, if any, encountered during the parsing of the sheets.

func (*XlsxParser) NumSheets

func (xp *XlsxParser) NumSheets() int

NumSheets returns the number of sheets.

func (*XlsxParser) SetColSep

func (xp *XlsxParser) SetColSep(sep string)

SetColSep sets the separator of the column text. Default is "\t".

func (*XlsxParser) SetDisableLogging

func (xp *XlsxParser) SetDisableLogging(v bool)

SetDisableLogging sets disable logging.

func (*XlsxParser) SetDrawingsNoFmt

func (xp *XlsxParser) SetDrawingsNoFmt(v bool)

SetDrawingsNoFmt sets drawings text no outline format.

func (*XlsxParser) SetOcrInterface

func (dp *XlsxParser) SetOcrInterface(ocr types.OCR)

SetOcrInterface overrides default ocr interface.

func (*XlsxParser) SetOnlySharedStrings

func (xp *XlsxParser) SetOnlySharedStrings(v bool)

SetOnlySharedStrings sets only parsing shared strings or not. Default is false.

func (*XlsxParser) SetParseCharts

func (xp *XlsxParser) SetParseCharts(v bool)

SetParseCharts parses charts or not. Default is false.

func (*XlsxParser) SetParseDiagrams

func (xp *XlsxParser) SetParseDiagrams(v bool)

SetParseDiagrams parses diagrams or not. Default is false.

func (*XlsxParser) SetParseImages

func (xp *XlsxParser) SetParseImages(v bool)

SetParseImages parses images or not. Default is false. When ocr interface is not set, default tesseract-ocr will be used.

func (*XlsxParser) SetRowSep

func (xp *XlsxParser) SetRowSep(sep string)

SetRowSep sets the separator of the row text. Default is "\n".

func (*XlsxParser) SetSheetSep

func (xp *XlsxParser) SetSheetSep(sep string)

SetSheetSep sets the separator of the sheet text. Default is "-"x100.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL