Documentation ¶
Index ¶
- type DocxParser
- func (dp *DocxParser) Close() (err error)
- func (dp *DocxParser) DisableLogging(v bool)
- func (dp *DocxParser) ExtractImages() ([]types.Image, error)
- func (dp *DocxParser) ExtractTexts() (string, error)
- func (dp *DocxParser) SetDrawingsNoFmt(v bool)
- func (dp *DocxParser) SetOcrInterface(ocr types.OCR)
- func (dp *DocxParser) SetParagraphSep(sep string)
- func (dp *DocxParser) SetParseCharts(v bool)
- func (dp *DocxParser) SetParseComments(v bool)
- func (dp *DocxParser) SetParseDiagrams(v bool)
- func (dp *DocxParser) SetParseEndnotes(v bool)
- func (dp *DocxParser) SetParseFooters(v bool)
- func (dp *DocxParser) SetParseFootnotes(v bool)
- func (dp *DocxParser) SetParseHeaders(v bool)
- func (dp *DocxParser) SetParseImages(v bool)
- func (dp *DocxParser) SetPartSep(sep string)
- func (dp *DocxParser) SetTableColSep(sep string)
- func (dp *DocxParser) SetTableRowSep(sep string)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type DocxParser ¶
type DocxParser struct {
// contains filtered or unexported fields
}
DocxParser represents the XML file structure and settings for parsing a docx file.
func Open ¶
func Open(path string) (*DocxParser, error)
Open opens the specified docx file path and returns a new DocxParser instance and an error, if any.
Parameters:
- path: a string representing the path to the docx file.
Returns:
- *DocxParser: a pointer to the DocxParser struct.
- error: an error, if any.
func OpenReader ¶
func OpenReader(r io.ReaderAt, n int64) (*DocxParser, error)
OpenReader opens a DocxParser for the given io.ReaderAt and file size.
Parameters:
- r: The io.ReaderAt to read the docx file from.
- n: The size of the docx file.
Returns:
- *DocxParser: The opened DocxParser object.
- error: Any error that occurred during the opening process.
func OpenURL ¶
func OpenURL(u string) (*DocxParser, int, error)
OpenURL opens the specified docx file URL and returns a DocxParser, status code, and error.
Parameters:
- u (string): The URL to open.
Returns:
- *DocxParser: A pointer to a DocxParser.
- int: The status code.
- error: An error object.
func (*DocxParser) Close ¶
func (dp *DocxParser) Close() (err error)
Close closes the zipReader and OCR client. After extracting the text, please remember to call this method.
func (*DocxParser) DisableLogging ¶
func (dp *DocxParser) DisableLogging(v bool)
DisableLogging disables logging.
func (*DocxParser) ExtractImages ¶
func (dp *DocxParser) ExtractImages() ([]types.Image, error)
ExtractImages extracts images from the docx file.
Parameters:
- None
Returns:
- []types.Image: a slice of images extracted from the docx file.
- error: an error if any occurred during the extraction process.
func (*DocxParser) ExtractTexts ¶
func (dp *DocxParser) ExtractTexts() (string, error)
ExtractTexts extracts the texts from the docx file.
Parameters:
- None
Returns:
- string: The extracted texts.
- error: An error if any.
func (*DocxParser) SetDrawingsNoFmt ¶
func (dp *DocxParser) SetDrawingsNoFmt(v bool)
SetDrawingsNoFmt sets drawings text no outline format.
func (*DocxParser) SetOcrInterface ¶
func (dp *DocxParser) SetOcrInterface(ocr types.OCR)
SetOcrInterface overrides default ocr interface.
func (*DocxParser) SetParagraphSep ¶
func (dp *DocxParser) SetParagraphSep(sep string)
SetParagraphSep sets paragraph separator. Default is "\n".
func (*DocxParser) SetParseCharts ¶
func (dp *DocxParser) SetParseCharts(v bool)
SetParseCharts parses charts or not. Default is false.
func (*DocxParser) SetParseComments ¶
func (dp *DocxParser) SetParseComments(v bool)
SetParseComments parses comments or not. Default is true.
func (*DocxParser) SetParseDiagrams ¶
func (dp *DocxParser) SetParseDiagrams(v bool)
SetParseDiagrams parses diagrams or not. Default is false.
func (*DocxParser) SetParseEndnotes ¶
func (dp *DocxParser) SetParseEndnotes(v bool)
SetParseEndnotes parses endnotes or not. Default is true.
func (*DocxParser) SetParseFooters ¶
func (dp *DocxParser) SetParseFooters(v bool)
SetParseFooters parses footers or not. Default is true.
func (*DocxParser) SetParseFootnotes ¶
func (dp *DocxParser) SetParseFootnotes(v bool)
SetParseFootnotes parses footnotes or not. Default is true.
func (*DocxParser) SetParseHeaders ¶
func (dp *DocxParser) SetParseHeaders(v bool)
SetParseHeaders parses headers or not. Default is true.
func (*DocxParser) SetParseImages ¶
func (dp *DocxParser) SetParseImages(v bool)
SetParseImages parses images or not. Default is false. When ocr interface is not set, default tesseract-ocr will be used.
func (*DocxParser) SetPartSep ¶
func (dp *DocxParser) SetPartSep(sep string)
SetPartSep sets document part(every XML file like header, footer, etc.) separator. Default is "-"x100.
func (*DocxParser) SetTableColSep ¶
func (dp *DocxParser) SetTableColSep(sep string)
SetTableColSep sets table column separator. Default is "\t".
func (*DocxParser) SetTableRowSep ¶
func (dp *DocxParser) SetTableRowSep(sep string)
SetTableRowSep sets table row separator. Default is "\n".