pptxtotext

package
v1.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 11, 2023 License: MIT Imports: 12 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type PptxParser

type PptxParser struct {
	// contains filtered or unexported fields
}

PptxParser represents the XML file structure and settings for parsing a pptx file.

func Open

func Open(path string) (*PptxParser, error)

Open opens the specified pptx file path and returns a new PptxParser instance and an error, if any.

Parameters:

  • path: a string representing the path to the pptx file.

Returns:

  • *PptxParser: a pointer to the PptxParser struct.
  • error: an error, if any.

func OpenReader

func OpenReader(r io.ReaderAt, n int64) (*PptxParser, error)

OpenReader opens a PptxParser for the given io.ReaderAt and file size.

Parameters:

  • r: The io.ReaderAt to read the pptx file from.
  • n: The size of the pptx file.

Returns:

  • *PptxParser: The opened PptxParser object.
  • error: Any error that occurred during the opening process.

func OpenURL

func OpenURL(u string) (*PptxParser, int, error)

OpenURL opens the specified pptx file URL and returns a PptxParser, status code, and error.

Parameters:

  • u (string): The URL to open.

Returns:

  • *PptxParser: A pointer to a PptxParser.
  • int: The status code.
  • error: An error object.

func (*PptxParser) Close

func (pp *PptxParser) Close() (err error)

Close closes the zipReader and OCR client. After extracting the text, please remember to call this method.

func (*PptxParser) DisableLogging

func (pp *PptxParser) DisableLogging(v bool)

DisableLogging disables logging.

func (*PptxParser) ExtractImages added in v1.0.1

func (pp *PptxParser) ExtractImages() ([]types.Image, error)

ExtractImages extracts images from the pptx file.

Parameters:

  • None

Returns:

  • []types.Image: a slice of images extracted from the pptx file.
  • error: an error if any occurred during the extraction process.

func (*PptxParser) ExtractSlideTexts

func (pp *PptxParser) ExtractSlideTexts(slides ...int) (string, error)

ExtractSlideTexts extracts the texts from the specified pptx slides(start 1).

It takes in one or more slide numbers as parameters and returns a string containing the extracted texts. The function also returns an error if there is any issue with parsing the slides.

Parameters:

  • slides: An integer slice containing the slide numbers to extract texts from.

Returns:

  • string: A string containing the extracted texts.
  • error: An error object if there is any issue with parsing the slides.

func (*PptxParser) ExtractTexts

func (pp *PptxParser) ExtractTexts() (string, error)

ExtractTexts extracts the texts from the pptx file.

It iterates through each slide of the pptx file and appends the text content to a strings.Builder object. The extracted texts are then returned as a string. If there is an error encountered during the parsing of a slide, the function returns the extracted texts up to that point, along with the error.

Returns:

  • string: The extracted texts from the pptx file.
  • error: An error, if any, encountered during the parsing of the slides.

func (*PptxParser) NumSlides

func (pp *PptxParser) NumSlides() int

NumSlides returns the number of slides.

func (*PptxParser) SetDrawingsNoFmt

func (pp *PptxParser) SetDrawingsNoFmt(v bool)

SetDrawingsNoFmt sets drawings text no outline format.

func (*PptxParser) SetOcrInterface

func (pp *PptxParser) SetOcrInterface(ocr types.OCR)

SetOcrInterface overrides default ocr interface.

func (*PptxParser) SetParseCharts

func (pp *PptxParser) SetParseCharts(v bool)

SetParseCharts parses charts or not. Default is false.

func (*PptxParser) SetParseDiagrams

func (pp *PptxParser) SetParseDiagrams(v bool)

SetParseDiagrams parses diagrams or not. Default is false.

func (*PptxParser) SetParseImages

func (pp *PptxParser) SetParseImages(v bool)

SetParseImages parses images or not. Default is false. When ocr interface is not set, default tesseract-ocr will be used.

func (*PptxParser) SetPhraseSep

func (pp *PptxParser) SetPhraseSep(sep string)

SetParagraphSep sets phrase separator. Default is " ".

func (*PptxParser) SetSlideSep

func (pp *PptxParser) SetSlideSep(sep string)

SetSlideSep sets slide text separator. Default is "-"x100.

func (*PptxParser) SetTableColSep

func (pp *PptxParser) SetTableColSep(sep string)

SetTableColSep sets table column separator. Default is "\t".

func (*PptxParser) SetTableRowSep

func (pp *PptxParser) SetTableRowSep(sep string)

SetTableRowSep sets table row separator. Default is "\n".

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL