Documentation ¶
Overview ¶
Package doctotext provides some functions to extract text from doc file by antiword cmd or tika server
Index ¶
- func ExtractFromPath(path string) (string, error)
- func ExtractFromPathByTika(path string, tikaServerURL string) (string, int, error)
- func ExtractFromReader(r io.Reader) (string, error)
- func ExtractFromReaderByTika(r io.Reader, size int, tikaServerURL string) (string, int, error)
- func ExtractFromURL(u string) (string, int, error)
- func ExtractFromURLByTika(u string, tikaServerURL string) (string, int, error)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ExtractFromPath ¶
ExtractFromPath extracts text by "antiword" cmd from the given doc file path.
Parameters:
- path: the path of the doc file.
Returns:
- string: the extracted data
- error: any error encountered.
func ExtractFromPathByTika ¶
ExtractFromPathByTika extracts text content from a doc file specified by the given path using the Tika server located at the provided URL.
Parameters:
- path: The path to the doc file.
- tikaServerURL: The URL of the Tika server.
Returns:
- string: The extracted text content.
- int: the HTTP status code from Tika server.
- error: An error if any occurred during the extraction process.
func ExtractFromReader ¶
ExtractFromReader extracts text data from an io.Reader.
It reads the data from the provided io.Reader and stores it in a temporary file. Then it uses the "antiword" command to extract the text from the temporary file. The extracted text is returned as a string.
Parameters:
- r: An io.Reader from which the data will be read.
Returns:
- string: The extracted text.
- error: An error if any occurred during the extraction process.
func ExtractFromReaderByTika ¶
ExtractFromReaderByTika extracts text data from a reader using Tika server.
Parameters:
- r: an io.Reader representing the input data.
- size: an int representing the size of the input data.
- tikaServerURL: a string representing the URL of the Tika server.
Returns:
- string: the extracted data.
- int: the status code of the Tika server response.
- error: an error, if any occurred.
func ExtractFromURL ¶
ExtractFromURL extracts text data by "antiword" cmd from a given doc file URL.
Parameters:
- u: a string representing the URL to extract data from.
Returns:
- string: the extracted data.
- int: the HTTP status code.
- error: any error that occurred during the extraction process.
func ExtractFromURLByTika ¶
ExtractFromURLByTika extracts text data from a given doc file URL using the Tika server.
Parameters:
- u (string): The doc file URL from which to extract the data.
- tikaServerURL (string): The URL of the Tika server.
Returns:
- string: The extracted data.
- int: The status code of the HTTP response from the URL or Tika server.
- error: Any error that occurred during the extraction process.
Types ¶
This section is empty.