Documentation ¶
Index ¶
- type PptxParser
- func (pp *PptxParser) Close() (err error)
- func (pp *PptxParser) DisableLogging(v bool)
- func (pp *PptxParser) ExtractImages() ([]types.Image, error)
- func (pp *PptxParser) ExtractSlideTexts(slides ...int) (string, error)
- func (pp *PptxParser) ExtractTexts() (string, error)
- func (pp *PptxParser) NumSlides() int
- func (pp *PptxParser) SetDrawingsNoFmt(v bool)
- func (pp *PptxParser) SetOcrInterface(ocr types.OCR)
- func (pp *PptxParser) SetParseCharts(v bool)
- func (pp *PptxParser) SetParseDiagrams(v bool)
- func (pp *PptxParser) SetParseImages(v bool)
- func (pp *PptxParser) SetPhraseSep(sep string)
- func (pp *PptxParser) SetSlideSep(sep string)
- func (pp *PptxParser) SetTableColSep(sep string)
- func (pp *PptxParser) SetTableRowSep(sep string)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type PptxParser ¶
type PptxParser struct {
// contains filtered or unexported fields
}
PptxParser represents the XML file structure and settings for parsing a pptx file.
func Open ¶
func Open(path string) (*PptxParser, error)
Open opens the specified pptx file path and returns a new PptxParser instance and an error, if any.
Parameters:
- path: a string representing the path to the pptx file.
Returns:
- *PptxParser: a pointer to the PptxParser struct.
- error: an error, if any.
func OpenReader ¶
func OpenReader(r io.ReaderAt, n int64) (*PptxParser, error)
OpenReader opens a PptxParser for the given io.ReaderAt and file size.
Parameters:
- r: The io.ReaderAt to read the pptx file from.
- n: The size of the pptx file.
Returns:
- *PptxParser: The opened PptxParser object.
- error: Any error that occurred during the opening process.
func OpenURL ¶
func OpenURL(u string) (*PptxParser, int, error)
OpenURL opens the specified pptx file URL and returns a PptxParser, status code, and error.
Parameters:
- u (string): The URL to open.
Returns:
- *PptxParser: A pointer to a PptxParser.
- int: The status code.
- error: An error object.
func (*PptxParser) Close ¶
func (pp *PptxParser) Close() (err error)
Close closes the zipReader and OCR client. After extracting the text, please remember to call this method.
func (*PptxParser) DisableLogging ¶
func (pp *PptxParser) DisableLogging(v bool)
DisableLogging disables logging.
func (*PptxParser) ExtractImages ¶ added in v1.0.1
func (pp *PptxParser) ExtractImages() ([]types.Image, error)
ExtractImages extracts images from the pptx file.
Parameters:
- None
Returns:
- []types.Image: a slice of images extracted from the pptx file.
- error: an error if any occurred during the extraction process.
func (*PptxParser) ExtractSlideTexts ¶
func (pp *PptxParser) ExtractSlideTexts(slides ...int) (string, error)
ExtractSlideTexts extracts the texts from the specified pptx slides(start 1).
It takes in one or more slide numbers as parameters and returns a string containing the extracted texts. The function also returns an error if there is any issue with parsing the slides.
Parameters:
- slides: An integer slice containing the slide numbers to extract texts from.
Returns:
- string: A string containing the extracted texts.
- error: An error object if there is any issue with parsing the slides.
func (*PptxParser) ExtractTexts ¶
func (pp *PptxParser) ExtractTexts() (string, error)
ExtractTexts extracts the texts from the pptx file.
It iterates through each slide of the pptx file and appends the text content to a strings.Builder object. The extracted texts are then returned as a string. If there is an error encountered during the parsing of a slide, the function returns the extracted texts up to that point, along with the error.
Returns:
- string: The extracted texts from the pptx file.
- error: An error, if any, encountered during the parsing of the slides.
func (*PptxParser) NumSlides ¶
func (pp *PptxParser) NumSlides() int
NumSlides returns the number of slides.
func (*PptxParser) SetDrawingsNoFmt ¶
func (pp *PptxParser) SetDrawingsNoFmt(v bool)
SetDrawingsNoFmt sets drawings text no outline format.
func (*PptxParser) SetOcrInterface ¶
func (pp *PptxParser) SetOcrInterface(ocr types.OCR)
SetOcrInterface overrides default ocr interface.
func (*PptxParser) SetParseCharts ¶
func (pp *PptxParser) SetParseCharts(v bool)
SetParseCharts parses charts or not. Default is false.
func (*PptxParser) SetParseDiagrams ¶
func (pp *PptxParser) SetParseDiagrams(v bool)
SetParseDiagrams parses diagrams or not. Default is false.
func (*PptxParser) SetParseImages ¶
func (pp *PptxParser) SetParseImages(v bool)
SetParseImages parses images or not. Default is false. When ocr interface is not set, default tesseract-ocr will be used.
func (*PptxParser) SetPhraseSep ¶
func (pp *PptxParser) SetPhraseSep(sep string)
SetParagraphSep sets phrase separator. Default is " ".
func (*PptxParser) SetSlideSep ¶
func (pp *PptxParser) SetSlideSep(sep string)
SetSlideSep sets slide text separator. Default is "-"x100.
func (*PptxParser) SetTableColSep ¶
func (pp *PptxParser) SetTableColSep(sep string)
SetTableColSep sets table column separator. Default is "\t".
func (*PptxParser) SetTableRowSep ¶
func (pp *PptxParser) SetTableRowSep(sep string)
SetTableRowSep sets table row separator. Default is "\n".