docxer

package module

v0.0.0-...-978e3eb Latest Latest Go to latest Published: Aug 19, 2021 License: Unlicense Imports: 14 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ucookie/docxer

Links

Open Source Insights

README ¶

docxer

docx文档内容提取

参考：https://github1s.com/lukasjarosch/go-docx/blob/HEAD/placeholder_test.go#L19

Documentation ¶

Index ¶

Constants
Variables
func NewRunID() int
func ReadZipReader(r *zip.Reader, o *Options) (map[string][]byte, error)
func ResetRunIdCounter()
func ValidatePositions(document []byte, runs []*Run) error
type DocumentRuns
- func (dr *DocumentRuns) Pop() *Run
- func (dr *DocumentRuns) Push(run *Run)
- func (dr DocumentRuns) WithText() DocumentRuns
type File
- func NewFile() *File
- func OpenFile(filename string, opt *Options) (*File, error)
- func OpenReader(r io.Reader, opt *Options) (*File, error)
type Options
type Position
- func (p Position) Match(regexp *regexp.Regexp, data []byte) bool
- func (p Position) Valid() bool
type Reader
- func NewReader(s string) *Reader
- func (r *Reader) Len() int
- func (r *Reader) Pos() int64
- func (r *Reader) Read(b []byte) (int, error)
- func (r *Reader) ReadByte() (byte, error)
- func (r *Reader) Size() int64
- func (r *Reader) String() string
type Run
- func NewEmptyRun() *Run
- func (r *Run) GetText(documentBytes []byte) string
- func (r *Run) String(bytes []byte) string
type RunParser
- func NewRunParser(doc []byte) *RunParser
- func (parser *RunParser) Execute() error
- func (parser *RunParser) Runs() DocumentRuns
type TagPair

Constants ¶

View Source

const (
	// RunElementName is the local name of the XML tag for runs (<w:r>, </w:r> and <w:r/>)
	RunElementName = "r"
	// TextElementName is the local name of the XML tag for text-runs (<w:t> and </w:t>)
	TextElementName = "t"
)

View Source

const (
	UnzipSizeLimit = 1000 << 24
)

Variables ¶

View Source

var (
	// RunOpenTagRegex matches all OpenTags for runs, including eventually set attributes
	RunOpenTagRegex = regexp.MustCompile(`(<w:r).*>`)
	// RunCloseTagRegex matches the close tag of runs
	RunCloseTagRegex = regexp.MustCompile(`(</w:r>)`)
	// RunSingletonTagRegex matches a singleton run tag
	RunSingletonTagRegex = regexp.MustCompile(`(<w:r/>)`)
	// TextOpenTagRegex matches all OpenTags for text-runs, including eventually set attributes
	TextOpenTagRegex = regexp.MustCompile(`(<w:t).*>`)
	// TextCloseTagRegex matches the close tag of text-runs
	TextCloseTagRegex = regexp.MustCompile(`(</w:t>)`)
	// ErrTagsInvalid is returned if the parsing failed and the result cannot be used.
	// Typically this means that one or more tag-offsets were not parsed correctly which
	// would cause the document to become corrupted as soon as replacing starts.
	ErrTagsInvalid = errors.New("one or more tags are invalid and will cause the XML to be corrupt")
)

Functions ¶

func ResetRunIdCounter ¶

func ResetRunIdCounter()

ResetRunIdCounter will reset the runId counter to 0

func ValidatePositions ¶

func ValidatePositions(document []byte, runs []*Run) error

ValidatePositions will iterate over all runs and their texts (if any) and ensure that they match their respective regex. If the validation failed, the replacement will not work since offsets are wrong.

Types ¶

type DocumentRuns ¶

type DocumentRuns []*Run

DocumentRuns is a convenience type used to describe a slice of runs. It also implements Push() and Pop() which allows it to be used as LIFO stack.

func (*DocumentRuns) Pop ¶

func (dr *DocumentRuns) Pop() *Run

Pop will return the last Run added to the stack and remove it.

func (*DocumentRuns) Push ¶

func (dr *DocumentRuns) Push(run *Run)

Push will push a new Run onto the DocumentRuns stack

func (DocumentRuns) WithText ¶

func (dr DocumentRuns) WithText() DocumentRuns

WithText returns all runs with the HasText flag set

type File ¶

type File struct {
	Path string
	Pkg  sync.Map
	// contains filtered or unexported fields
}

func NewFile ¶

func NewFile() *File

func OpenFile ¶

func OpenFile(filename string, opt *Options) (*File, error)

打开文件

func OpenReader ¶

func OpenReader(r io.Reader, opt *Options) (*File, error)

打开文件

type Options ¶

type Options struct {
	UnzipSizeLimit int64
}

选项

type Position ¶

type Position struct {
	Start int64
	End   int64
}

Position is a generic position of a tag, represented by byte offsets

func (Position) Match ¶

func (p Position) Match(regexp *regexp.Regexp, data []byte) bool

Match will apply a MatchString using the given regex on the given data and returns true if the position matches the regex inside the data.

func (Position) Valid ¶

func (p Position) Valid() bool

Valid returns true if Start <= End. Only then the position can be used, otherwise there will be a 'slice out of bounds' along the way.

type Reader ¶

type Reader struct {
	// contains filtered or unexported fields
}

Reader is a very basic io.Reader implementation which is capable of returning the current position.

func NewReader ¶

func NewReader(s string) *Reader

NewReader returns a new Reader given a string source.

func (*Reader) Len ¶

func (r *Reader) Len() int

Len returns the current length of the stream which has been read.

func (*Reader) Pos ¶

func (r *Reader) Pos() int64

Pos returns the current position which the reader is at.

func (*Reader) Read ¶

func (r *Reader) Read(b []byte) (int, error)

Read implements the io.Reader interface.

func (*Reader) ReadByte ¶

func (r *Reader) ReadByte() (byte, error)

ReadByte implements hte io.ByteReader interface.

func (*Reader) Size ¶

func (r *Reader) Size() int64

Size returns the size of the string to read.

func (*Reader) String ¶

func (r *Reader) String() string

String implements the Stringer interface.

type Run ¶

type Run struct {
	TagPair
	ID      int
	Text    TagPair // Text is the <w:t> tag pair which is always within a run and cannot be standalone.
	HasText bool
}

Run defines a non-block region of text with a common set of properties. It is specified with the <w:r> element. In our case the run is specified by four byte positions (start and end tag).

func NewEmptyRun ¶

func NewEmptyRun() *Run

NewEmptyRun returns a new, empty run which has only an ID set.

func (*Run) GetText ¶

func (r *Run) GetText(documentBytes []byte) string

GetText returns the text of the run, if any. If the run does not have a text or the given byte slice is too small, an empty string is returned

func (*Run) String ¶

func (r *Run) String(bytes []byte) string

String returns a string representation of the run, given the source bytes. It may be helpful in debugging.

type RunParser ¶

type RunParser struct {
	// contains filtered or unexported fields
}

RunParser can parse a list of Runs from a given byte slice.

func NewRunParser ¶

func NewRunParser(doc []byte) *RunParser

NewRunParser returns an initialized RunParser given the source-bytes.

func (*RunParser) Execute ¶

func (parser *RunParser) Execute() error

Execute will fire up the parser. The parser will do two passes on the given document. First, all <w:r> tags are located and marked. Then, inside that run tags the <w:t> tags are located.

func (*RunParser) Runs ¶

func (parser *RunParser) Runs() DocumentRuns

Runs returns the all runs found by the parser.

type TagPair ¶

type TagPair struct {
	OpenTag  Position
	CloseTag Position
}

TagPair describes an opening and closing tag position.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

docxer

Documentation ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func NewRunID ¶

func ReadZipReader ¶

func ResetRunIdCounter ¶

func ValidatePositions ¶

Types ¶

type DocumentRuns ¶

func (*DocumentRuns) Pop ¶

func (*DocumentRuns) Push ¶

func (DocumentRuns) WithText ¶

type File ¶

func NewFile ¶

func OpenFile ¶

func OpenReader ¶

type Options ¶

type Position ¶

func (Position) Match ¶

func (Position) Valid ¶

type Reader ¶

func NewReader ¶

func (*Reader) Len ¶

func (*Reader) Pos ¶

func (*Reader) Read ¶

func (*Reader) ReadByte ¶

func (*Reader) Size ¶

func (*Reader) String ¶

type Run ¶

func NewEmptyRun ¶

func (*Run) GetText ¶

func (*Run) String ¶

type RunParser ¶

func NewRunParser ¶

func (*RunParser) Execute ¶

func (*RunParser) Runs ¶

type TagPair ¶

Source Files ¶