Documentation ¶
Index ¶
- func ConvertDoc(r io.Reader) (string, map[string]string, error)
- func ConvertDocx(r io.Reader) (string, map[string]string, error)
- func ConvertHTML(r io.Reader, readability bool) (string, map[string]string, error)
- func ConvertImage(r io.Reader) (string, map[string]string, error)
- func ConvertODT(r io.Reader) (string, map[string]string, error)
- func ConvertPDF(r io.Reader) (string, map[string]string, error)
- func ConvertPDFText(path string) (BodyResult, MetaResult, error)
- func ConvertPages(r io.Reader) (string, map[string]string, error)
- func ConvertPathReadability(path string, readability bool) ([]byte, error)
- func ConvertRTF(r io.Reader) (string, map[string]string, error)
- func ConvertURL(input io.Reader, readability bool) (string, map[string]string, error)
- func ConvertXML(r io.Reader) (string, map[string]string, error)
- func DocxXMLToText(r io.Reader) (string, error)
- func HTMLReadability(r io.Reader) []byte
- func HTMLToText(input io.Reader) string
- func MimeTypeByExtension(filename string) string
- func SetImageLanguages(string)
- func Tidy(r io.Reader, xmlIn bool) ([]byte, error)
- func XMLToMap(r io.Reader) (map[string]string, error)
- func XMLToText(r io.Reader, breaks []string, skip []string, strict bool) (string, error)
- type BodyResult
- type HTMLReadabilityOptions
- type LocalFile
- type MetaResult
- type Response
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ConvertDoc ¶
ConvertDoc converts an MS Word .doc to text.
func ConvertDocx ¶
ConvertDocx converts an MS Word docx file to text.
func ConvertHTML ¶
ConvertHTML converts HTML into text.
func ConvertImage ¶
ConvertImage converts images to text. Requires gosseract (ocr build tag).
func ConvertODT ¶
ConvertODT converts a ODT file to text
func ConvertPDFText ¶
func ConvertPDFText(path string) (BodyResult, MetaResult, error)
func ConvertPages ¶
ConvertPages converts a Pages file to text.
func ConvertPathReadability ¶
ConvertPathReadability converts a local path to text, with the given readability option.
func ConvertRTF ¶
ConvertRTF converts RTF files to text.
func ConvertURL ¶
ConvertURL fetches the HTML page at the URL given in the io.Reader.
func ConvertXML ¶
ConvertXML converts an XML file to text.
func DocxXMLToText ¶
DocxXMLToText converts Docx XML into plain text.
func HTMLReadability ¶
HTMLReadability extracts the readable text in an HTML document
func MimeTypeByExtension ¶
MimeTypeByExtension returns a mimetype for the given extension, or application/octet-stream if none can be determined.
func SetImageLanguages ¶
func SetImageLanguages(string)
SetImageLanguages sets the languages parameter passed to gosseract.
func Tidy ¶
Tidy attempts to tidy up XML. Errors & warnings are deliberately suppressed as underlying tools throw warnings very easily.
Types ¶
type BodyResult ¶
type BodyResult struct {
// contains filtered or unexported fields
}
type HTMLReadabilityOptions ¶
type HTMLReadabilityOptions struct { LengthLow int LengthHigh int StopwordsLow float64 StopwordsHigh float64 MaxLinkDensity float64 MaxHeadingDistance int ReadabilityUseClasses string }
HTMLReadabilityOptions is a type which defines parameters that are passed to the justext package. TODO: Improve this!
var HTMLReadabilityOptionsValues HTMLReadabilityOptions
HTMLReadabilityOptionsValues are the global settings used for HTMLReadability. TODO: Remove this from global state.
type LocalFile ¶
LocalFile is a type which wraps an *os.File. See NewLocalFile for more details.
func NewLocalFile ¶
NewLocalFile ensures that there is a file which contains the data provided by r. If r is actually an instance of *os.File then this file is used, otherwise a temporary file is created (using dir and prefix) and the data from r copied into it. Callers must call Done() when the LocalFile is no longer needed to ensure all resources are cleaned up.
type Response ¶
type Response struct { Body string `json:"body"` Meta map[string]string `json:"meta"` MSecs uint32 `json:"msecs"` Error string `json:"error"` }
Response payload sent back to the requestor
func ConvertPath ¶
ConvertPath converts a local path to text.
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
Package client defines types and functions for interacting with docconv HTTP servers.
|
Package client defines types and functions for interacting with docconv HTTP servers. |
Package TSP is a generated protocol buffer package.
|
Package TSP is a generated protocol buffer package. |
Package snappy implements the snappy block-based compression format.
|
Package snappy implements the snappy block-based compression format. |