docxer

package module
v0.0.0-...-978e3eb Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 19, 2021 License: Unlicense Imports: 14 Imported by: 0

README

Documentation

Index

Constants

View Source
const (
	// RunElementName is the local name of the XML tag for runs (<w:r>, </w:r> and <w:r/>)
	RunElementName = "r"
	// TextElementName is the local name of the XML tag for text-runs (<w:t> and </w:t>)
	TextElementName = "t"
)
View Source
const (
	UnzipSizeLimit = 1000 << 24
)

Variables

View Source
var (
	// RunOpenTagRegex matches all OpenTags for runs, including eventually set attributes
	RunOpenTagRegex = regexp.MustCompile(`(<w:r).*>`)
	// RunCloseTagRegex matches the close tag of runs
	RunCloseTagRegex = regexp.MustCompile(`(</w:r>)`)
	// RunSingletonTagRegex matches a singleton run tag
	RunSingletonTagRegex = regexp.MustCompile(`(<w:r/>)`)
	// TextOpenTagRegex matches all OpenTags for text-runs, including eventually set attributes
	TextOpenTagRegex = regexp.MustCompile(`(<w:t).*>`)
	// TextCloseTagRegex matches the close tag of text-runs
	TextCloseTagRegex = regexp.MustCompile(`(</w:t>)`)
	// ErrTagsInvalid is returned if the parsing failed and the result cannot be used.
	// Typically this means that one or more tag-offsets were not parsed correctly which
	// would cause the document to become corrupted as soon as replacing starts.
	ErrTagsInvalid = errors.New("one or more tags are invalid and will cause the XML to be corrupt")
)

Functions

func NewRunID

func NewRunID() int

NewRunID returns the next Fragment.ID

func ReadZipReader

func ReadZipReader(r *zip.Reader, o *Options) (map[string][]byte, error)

读取内存中的zip文件

func ResetRunIdCounter

func ResetRunIdCounter()

ResetRunIdCounter will reset the runId counter to 0

func ValidatePositions

func ValidatePositions(document []byte, runs []*Run) error

ValidatePositions will iterate over all runs and their texts (if any) and ensure that they match their respective regex. If the validation failed, the replacement will not work since offsets are wrong.

Types

type DocumentRuns

type DocumentRuns []*Run

DocumentRuns is a convenience type used to describe a slice of runs. It also implements Push() and Pop() which allows it to be used as LIFO stack.

func (*DocumentRuns) Pop

func (dr *DocumentRuns) Pop() *Run

Pop will return the last Run added to the stack and remove it.

func (*DocumentRuns) Push

func (dr *DocumentRuns) Push(run *Run)

Push will push a new Run onto the DocumentRuns stack

func (DocumentRuns) WithText

func (dr DocumentRuns) WithText() DocumentRuns

WithText returns all runs with the HasText flag set

type File

type File struct {
	Path string
	Pkg  sync.Map
	// contains filtered or unexported fields
}

func NewFile

func NewFile() *File

func OpenFile

func OpenFile(filename string, opt *Options) (*File, error)

打开文件

func OpenReader

func OpenReader(r io.Reader, opt *Options) (*File, error)

打开文件

type Options

type Options struct {
	UnzipSizeLimit int64
}

选项

type Position

type Position struct {
	Start int64
	End   int64
}

Position is a generic position of a tag, represented by byte offsets

func (Position) Match

func (p Position) Match(regexp *regexp.Regexp, data []byte) bool

Match will apply a MatchString using the given regex on the given data and returns true if the position matches the regex inside the data.

func (Position) Valid

func (p Position) Valid() bool

Valid returns true if Start <= End. Only then the position can be used, otherwise there will be a 'slice out of bounds' along the way.

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader is a very basic io.Reader implementation which is capable of returning the current position.

func NewReader

func NewReader(s string) *Reader

NewReader returns a new Reader given a string source.

func (*Reader) Len

func (r *Reader) Len() int

Len returns the current length of the stream which has been read.

func (*Reader) Pos

func (r *Reader) Pos() int64

Pos returns the current position which the reader is at.

func (*Reader) Read

func (r *Reader) Read(b []byte) (int, error)

Read implements the io.Reader interface.

func (*Reader) ReadByte

func (r *Reader) ReadByte() (byte, error)

ReadByte implements hte io.ByteReader interface.

func (*Reader) Size

func (r *Reader) Size() int64

Size returns the size of the string to read.

func (*Reader) String

func (r *Reader) String() string

String implements the Stringer interface.

type Run

type Run struct {
	TagPair
	ID      int
	Text    TagPair // Text is the <w:t> tag pair which is always within a run and cannot be standalone.
	HasText bool
}

Run defines a non-block region of text with a common set of properties. It is specified with the <w:r> element. In our case the run is specified by four byte positions (start and end tag).

func NewEmptyRun

func NewEmptyRun() *Run

NewEmptyRun returns a new, empty run which has only an ID set.

func (*Run) GetText

func (r *Run) GetText(documentBytes []byte) string

GetText returns the text of the run, if any. If the run does not have a text or the given byte slice is too small, an empty string is returned

func (*Run) String

func (r *Run) String(bytes []byte) string

String returns a string representation of the run, given the source bytes. It may be helpful in debugging.

type RunParser

type RunParser struct {
	// contains filtered or unexported fields
}

RunParser can parse a list of Runs from a given byte slice.

func NewRunParser

func NewRunParser(doc []byte) *RunParser

NewRunParser returns an initialized RunParser given the source-bytes.

func (*RunParser) Execute

func (parser *RunParser) Execute() error

Execute will fire up the parser. The parser will do two passes on the given document. First, all <w:r> tags are located and marked. Then, inside that run tags the <w:t> tags are located.

func (*RunParser) Runs

func (parser *RunParser) Runs() DocumentRuns

Runs returns the all runs found by the parser.

type TagPair

type TagPair struct {
	OpenTag  Position
	CloseTag Position
}

TagPair describes an opening and closing tag position.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL