Documentation ¶
Index ¶
- type XlsxParser
- func (xp *XlsxParser) Close() (err error)
- func (xp *XlsxParser) ExtractImages() ([]types.Image, error)
- func (xp *XlsxParser) ExtractSheetTexts(sheets ...int) (string, error)
- func (xp *XlsxParser) ExtractTexts() (string, error)
- func (xp *XlsxParser) NumSheets() int
- func (xp *XlsxParser) SetColSep(sep string)
- func (xp *XlsxParser) SetDisableLogging(v bool)
- func (xp *XlsxParser) SetDrawingsNoFmt(v bool)
- func (dp *XlsxParser) SetOcrInterface(ocr types.OCR)
- func (xp *XlsxParser) SetOnlySharedStrings(v bool)
- func (xp *XlsxParser) SetParseCharts(v bool)
- func (xp *XlsxParser) SetParseDiagrams(v bool)
- func (xp *XlsxParser) SetParseImages(v bool)
- func (xp *XlsxParser) SetRowSep(sep string)
- func (xp *XlsxParser) SetSheetSep(sep string)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type XlsxParser ¶
type XlsxParser struct {
// contains filtered or unexported fields
}
XlsxParser represents the XML file structure and settings for parsing a xlsx file.
func Open ¶
func Open(path string) (*XlsxParser, error)
Open opens the specified xlsx file path and returns a new XlsxParser instance and an error, if any.
Parameters:
- path: a string representing the path to the xlsx file.
Returns:
- *XlsxParser: a pointer to the XlsxParser struct.
- error: an error, if any.
func OpenReader ¶
func OpenReader(r io.ReaderAt, n int64) (*XlsxParser, error)
OpenReader opens a DocxParser for the given io.ReaderAt and file size.
Parameters:
- r: The io.ReaderAt to read the docx file from.
- n: The size of the docx file.
Returns:
- *DocxParser: The opened DocxParser object.
- error: Any error that occurred during the opening process.
func OpenURL ¶
func OpenURL(u string) (*XlsxParser, int, error)
OpenURL opens the specified xlsx file URL and returns a XlsxParser, status code, and error.
Parameters:
- u (string): The URL to open.
Returns:
- *XlsxParser: A pointer to a XlsxParser.
- int: The status code.
- error: An error object.
func (*XlsxParser) Close ¶
func (xp *XlsxParser) Close() (err error)
Close closes the zipReader and OCR client. After extracting the text, please remember to call this method.
func (*XlsxParser) ExtractImages ¶ added in v1.0.1
func (xp *XlsxParser) ExtractImages() ([]types.Image, error)
ExtractImages extracts images from the xlsx file.
Parameters:
- None
Returns:
- []types.Image: a slice of images extracted from the xlsx file.
- error: an error if any occurred during the extraction process.
func (*XlsxParser) ExtractSheetTexts ¶
func (xp *XlsxParser) ExtractSheetTexts(sheets ...int) (string, error)
ExtractSheetTexts extracts the texts from the specified xlsx sheets(start 1).
It takes in one or more sheet numbers as parameters and returns a string containing the extracted texts. The function also returns an error if there is any issue with parsing the sheets.
Parameters:
- sheets: An integer slice containing the sheet numbers to extract texts from.
Returns:
- string: A string containing the extracted texts.
- error: An error object if there is any issue with parsing the sheets.
func (*XlsxParser) ExtractTexts ¶
func (xp *XlsxParser) ExtractTexts() (string, error)
ExtractTexts extracts the texts from the xlsx file.
It iterates through each sheet of the xlsx file and appends the text content to a strings.Builder object. The extracted texts are then returned as a string.
If onlySharedStrings is set to true, only shared strings will be extracted.
If there is an error encountered during the parsing of a sheet, the function returns the extracted texts up to that point, along with the error.
Returns:
- string: The extracted texts from the xlsx file.
- error: An error, if any, encountered during the parsing of the sheets.
func (*XlsxParser) NumSheets ¶
func (xp *XlsxParser) NumSheets() int
NumSheets returns the number of sheets.
func (*XlsxParser) SetColSep ¶
func (xp *XlsxParser) SetColSep(sep string)
SetColSep sets the separator of the column text. Default is "\t".
func (*XlsxParser) SetDisableLogging ¶
func (xp *XlsxParser) SetDisableLogging(v bool)
SetDisableLogging sets disable logging.
func (*XlsxParser) SetDrawingsNoFmt ¶
func (xp *XlsxParser) SetDrawingsNoFmt(v bool)
SetDrawingsNoFmt sets drawings text no outline format.
func (*XlsxParser) SetOcrInterface ¶
func (dp *XlsxParser) SetOcrInterface(ocr types.OCR)
SetOcrInterface overrides default ocr interface.
func (*XlsxParser) SetOnlySharedStrings ¶
func (xp *XlsxParser) SetOnlySharedStrings(v bool)
SetOnlySharedStrings sets only parsing shared strings or not. Default is false.
func (*XlsxParser) SetParseCharts ¶
func (xp *XlsxParser) SetParseCharts(v bool)
SetParseCharts parses charts or not. Default is false.
func (*XlsxParser) SetParseDiagrams ¶
func (xp *XlsxParser) SetParseDiagrams(v bool)
SetParseDiagrams parses diagrams or not. Default is false.
func (*XlsxParser) SetParseImages ¶
func (xp *XlsxParser) SetParseImages(v bool)
SetParseImages parses images or not. Default is false. When ocr interface is not set, default tesseract-ocr will be used.
func (*XlsxParser) SetRowSep ¶
func (xp *XlsxParser) SetRowSep(sep string)
SetRowSep sets the separator of the row text. Default is "\n".
func (*XlsxParser) SetSheetSep ¶
func (xp *XlsxParser) SetSheetSep(sep string)
SetSheetSep sets the separator of the sheet text. Default is "-"x100.