fileExtractor

package
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 13, 2023 License: MIT Imports: 6 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DOCX2Text

func DOCX2Text(file io.ReaderAt, size int64) (string, error)

DOCX2Text extracts text of a Word document Size is the full size of the input file.

func IsFileDOCX

func IsFileDOCX(data []byte) bool

IsFileDOCX checks if the data indicates a DOCX file DOCX has a signature of 50 4B 03 04

Types

type WordDocument

type WordDocument struct {
	Paragraphs []WordParagraph
}

WordDocument is a full word doc

func WordParse

func WordParse(doc string) (WordDocument, error)

WordParse parses a word file

func (WordDocument) AsText

func (w WordDocument) AsText() string

AsText returns all text in the document

type WordParagraph

type WordParagraph struct {
	Style WordStyle `xml:"pPr>pStyle"`
	Rows  []WordRow `xml:"r"`
}

WordParagraph is a single paragraph

type WordRow

type WordRow struct {
	Text string `xml:"t"`
}

WordRow ...

type WordStyle

type WordStyle struct {
	Val string `xml:"val,attr"`
}

WordStyle ...

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL