docx

package module
v0.0.0-...-591242a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 13, 2024 License: MIT Imports: 11 Imported by: 0

README

Docx library

Yet another library to read and write .docx (a.k.a. Microsoft Word documents or ECMA-376 Office Open XML) files in Go.

About this fork

first, this fork is a testless fork.

second, what I added:

  • style unmarshall

Introduction

As part of my work for Basement Crowd and FromCounsel, we were in need of a basic library to manipulate (both read and write) Microsoft Word documents.

The difference with other projects is the following:

  • UniOffice is probably the most complete but it is also commercial (you need to pay). It also very complete, but too much for my needs.

  • gingfrederik/docx only allows to write.

There are also a couple of other projects kingzbauer/docx and nguyenthenguyen/docx

gingfrederik/docx was a heavy influence (the original structures and the main method come from that project).

However, those original structures didn't handle reading and extending them was particularly difficult due to Go xml parser being a bit limited including a 6 year old bug.

Additionally, my requirements go beyond the original structure and a hard fork seemed more sensible.

The plan is to evolve the library, so the API is likely to change according to my company's needs. But please do feel free to send patches, reports and PRs (or fork).

In the mean time, shared as an example in case somebody finds it useful.

Getting Started

Install

Go modules supported

go get github.com/extrame/docx
Usage

See main for an example

$ go build -o docx ./main
$ ./docx
Preparing new document to write at /tmp/new-file.docx
Document writen.
Now trying to read it
	We've found a new run with the text ->test
	We've found a new run with the text ->test font size
	We've found a new run with the text ->test color
	We've found a new run with the text ->test font size and color
	We've found a new hyperlink with ref http://google.com and the text google
End of main

You can also increase the log level (-logtostderr=true -v=0) and just dump a specific file(-file /tmp/new-file.docx). See getstructure/main

$ go build -o docx ./getstructure/ && ./docx -logtostderr=true -v=0 -file /tmp/new-file.docx
I0511 12:37:40.898493   18466 unpack.go:69] Relations: [...]
I0511 12:37:40.898787   18466 unpack.go:47] Doc: [...]
I0511 12:37:40.899330   18466 unpack.go:58] Paragraph [0xc000026d40 0xc000027d00 0xc000172340]
I0511 12:37:40.899369   18466 main.go:31] There is a new paragraph [...]
	We've found a new run with the text ->test
	We've found a new run with the text ->test font size
	We've found a new run with the text ->test color
I0511 12:37:40.899389   18466 main.go:31] There is a new paragraph [...]
	We've found a new run with the text ->test font size and color
I0511 12:37:40.899396   18466 main.go:31] There is a new paragraph [...]
	We've found a new hyperlink with ref http://google.com and the text google
End of main
Build
$ go build ./...

License

MIT. See LICENSE

Documentation

Index

Constants

View Source
const (
	TEMP_REL = `` /* 601-byte string literal not displayed */

	TEMP_DOCPROPS_APP  = `` /* 276-byte string literal not displayed */
	TEMP_DOCPROPS_CORE = `` /* 363-byte string literal not displayed */
	TEMP_CONTENT       = `` /* 934-byte string literal not displayed */

	TEMP_WORD_STYLE = `` /* 1743-byte string literal not displayed */

	TEMP_WORD_THEME_THEME = `` /* 9767-byte string literal not displayed */

)
View Source
const (
	XMLNS_W = `http://schemas.openxmlformats.org/wordprocessingml/2006/main`
	XMLNS_R = `http://schemas.openxmlformats.org/officeDocument/2006/relationships`
)
View Source
const (
	XMLNS         = `http://schemas.openxmlformats.org/package/2006/relationships`
	REL_HYPERLINK = `http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink`

	REL_TARGETMODE = "External"
)
View Source
const (
	HYPERLINK_STYLE = "a1"
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Body

type Body struct {
	XMLName    xml.Name     `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main body"`
	Paragraphs []*Paragraph `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main p"`
}

type DefinedStyle

type DefinedStyle struct {
	XMLName      xml.Name             `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main style"`
	Type         string               `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main type,attr"`
	StyleId      string               `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main styleId,attr"`
	Name         *StrValueNode        `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main name"`
	BasedOn      *StrValueNode        `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main basedOn"`
	Next         *StrValueNode        `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main next"`
	Link         string               `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main link"`
	RPr          *RunProperties       `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main rPr"`
	PPr          *ParagraphProperties `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main pPr"`
	AutoRedefine *StrValueNode        `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main autoRedefine"`
	SemiHidden   *StrValueNode        `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main semiHidden"`
	QFormat      *StrValueNode        `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main qFormat"`
	UiPriority   *StrValueNode        `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main uiPriority"`
	TblPr        *TblPr               `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main tblPr"`
}

func (*DefinedStyle) GetName

func (d *DefinedStyle) GetName() string

func (*DefinedStyle) HeadingLevel

func (d *DefinedStyle) HeadingLevel() int

type Document

type Document struct {
	XMLName xml.Name       `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main document"`
	XMLW    string         `xml:"xmlns:w,attr"`
	XMLR    string         `xml:"xmlns:r,attr"`
	Body    *Body          `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main body"`
	Styles  *DocumentStyle `xml:"-"`
}

type DocumentDefault

type DocumentDefault struct {
	XMLName    xml.Name    `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main docDefaults"`
	RPrDefault *RPrDefault `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main rPrDefault"`
	PPrDefault *PPrDefault `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main pPrDefault"`
}

type DocumentStyle

type DocumentStyle struct {
	XMLName         xml.Name         `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main styles"`
	XMLW            string           `xml:"xmlns:w,attr"`
	XMLR            string           `xml:"xmlns:r,attr"`
	DocumentDefault *DocumentDefault `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main docDefaults"`
	LatentStyles    *LatentStyles    `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main latentStyles"`
	Styles          []*DefinedStyle  `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main style"`
	// contains filtered or unexported fields
}

func (*DocumentStyle) GetStyleById

func (d *DocumentStyle) GetStyleById(styleId string) *DefinedStyle

type DocxFile

type DocxFile struct {
	Document    Document
	DocRelation Relationships
	// contains filtered or unexported fields
}

DocxFile is the structure that allow to access the internal represntation in memory of the doc (either read or about to be written)

func New

func New() *DocxFile

New generates a new empty docx file that we can manipulate and later on, save

func Open

func Open(fileName string) (doc *DocxFile, err error)

func Parse

func Parse(reader io.ReaderAt, size int64) (doc *DocxFile, err error)

Parse generates a new docx file in memory from a reader You can it invoke from a file

readFile, err := os.Open(FILE_PATH)
if err != nil {
	panic(err)
}
fileinfo, err := readFile.Stat()
if err != nil {
	panic(err)
}
size := fileinfo.Size()
doc, err := docx.Parse(readFile, int64(size))

but also you can invoke from a webform (BEWARE of trusting users data!!!)

func uploadFile(w http.ResponseWriter, r *http.Request) {
	r.ParseMultipartForm(10 << 20)

	file, handler, err := r.FormFile("file")
	if err != nil {
		fmt.Println("Error Retrieving the File")
		fmt.Println(err)
		http.Error(w, err.Error(), http.StatusBadRequest)
		return
	}
	defer file.Close()
	docx.Parse(file, handler.Size)
}

func (*DocxFile) AddParagraph

func (f *DocxFile) AddParagraph() *Paragraph

AddParagraph adds a new paragraph

func (*DocxFile) GetStyleById

func (d *DocxFile) GetStyleById(styleId string) *DefinedStyle

func (*DocxFile) Paragraphs

func (f *DocxFile) Paragraphs() []*Paragraph

func (*DocxFile) References

func (f *DocxFile) References(id string) (href string, err error)

References gets the url for a reference

func (*DocxFile) Write

func (f *DocxFile) Write(writer io.Writer) (err error)

Write allows to save a docx to a writer

type Fonts

type Fonts struct {
	XMLName  xml.Name `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main rFonts,omitempty"`
	Ascii    string   `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main ascii,attr"`
	HAnsi    string   `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main hAnsi,attr"`
	EastAsia string   `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main eastAsia,attr"`
	Complex  string   `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main complex,attr"`
	Cs       string   `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main cs,attr"`
}

Fonts contains the font family

type Hyperlink struct {
	XMLName xml.Name `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main hyperlink,omitempty"`
	ID      string   `xml:"http://schemas.openxmlformats.org/officeDocument/2006/relationships id,attr"`
	Run     Run      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main r,omitempty"`
	// contains filtered or unexported fields
}

The hyperlink element contains links

type Indent

type Indent struct {
	XMLName    xml.Name `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main ind,omitempty"`
	First      int      `xml:"http://schemas.openxmlformats.org/2006/main first,attr"`
	Hanging    int      `xml:"http://schemas.openxmlformats.org/2006/main hanging,attr"`
	Left       int      `xml:"http://schemas.openxmlformats.org/2006/main left,attr"`
	Right      int      `xml:"http://schemas.openxmlformats.org/2006/main right,attr"`
	LeftChars  int      `xml:"http://schemas.openxmlformats.org/2006/main leftChars,attr"`
	RightChars int      `xml:"http://schemas.openxmlformats.org/2006/main rightChars,attr"`
}

type IntValueNode

type IntValueNode struct {
	XMLName xml.Name
	Val     int64 `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main val,attr"`
}

type LatentStyles

type LatentStyles struct {
	XMLName           xml.Name        `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main latentStyles"`
	LsdExceptions     []*LsdException `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main lsdException"`
	Count             int             `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main count,attr"`
	DefQFormat        int             `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main defQFormat,attr"`
	DefUnhideWhenUsed int             `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main defUnhideWhenUsed,attr"`
	DefUIPriority     int             `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main defUIPriority,attr"`
	DefLockedState    int             `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main defLockedState,attr"`
	DefSemiHidden     int             `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main defSemiHidden,attr"`
	DefPrimaryStyle   int             `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main defPrimaryStyle,attr"`
}

type LsdException

type LsdException struct {
	XMLName        xml.Name `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main lsdException"`
	Name           string   `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main name,attr"`
	Locked         int      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main locked,attr"`
	SemiHidden     int      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main semiHidden,attr"`
	UnhideWhenUsed int      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main unhideWhenUsed,attr"`
	QFormat        int      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main qFormat,attr"`
	UIPriority     int      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main uiPriority,attr"`
	PrimaryStyle   int      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main primaryStyle,attr"`
}

type PPrDefault

type PPrDefault struct {
	XMLName xml.Name             `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main pPrDefault"`
	PPr     *ParagraphProperties `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main pPr"`
}

type Paragraph

type Paragraph struct {
	XMLName    xml.Name `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main p"`
	Properties *ParagraphProperties
	Links      []*Hyperlink `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main hyperlink,omitempty"`
	Runs       []*Run       `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main r,omitempty"`
	// contains filtered or unexported fields
}
func (p *Paragraph) AddLink(text string, link string) *Hyperlink

AddLink adds an hyperlink to paragraph

func (*Paragraph) AddText

func (p *Paragraph) AddText(text string) *Run

AddText adds text to paragraph

func (*Paragraph) GetOutlineLevel

func (p *Paragraph) GetOutlineLevel() int

func (*Paragraph) GetStyle

func (p *Paragraph) GetStyle() *DefinedStyle

func (*Paragraph) Text

func (p *Paragraph) Text() string

func (*Paragraph) UnmarshalXML

func (p *Paragraph) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error

type ParagraphProperties

type ParagraphProperties struct {
	XMLName xml.Name      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main pPr"`
	Style   *StrValueNode `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main pStyle,omitempty"`
	Spacing *Spacing      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main spacing,omitempty"`
	Ind     *Indent       `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main ind,omitempty"`
	Jc      *StrValueNode `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main jc,omitempty"`
	Outline *StrValueNode `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main outlineLvl,omitempty"`
}

func (*ParagraphProperties) GetStyleId

func (p *ParagraphProperties) GetStyleId() string

type RPrDefault

type RPrDefault struct {
	XMLName xml.Name       `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main rPrDefault"`
	RPr     *RunProperties `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main rPr"`
}

type Relationship

type Relationship struct {
	XMLName    xml.Name `xml:"Relationship"`
	ID         string   `xml:"Id,attr"`
	Type       string   `xml:"Type,attr"`
	Target     string   `xml:"Target,attr"`
	TargetMode string   `xml:"TargetMode,attr,omitempty"`
}

type Relationships

type Relationships struct {
	XMLName       xml.Name        `xml:"Relationships"`
	Xmlns         string          `xml:"xmlns,attr"`
	Relationships []*Relationship `xml:"Relationship"`
}

type Run

type Run struct {
	XMLName       xml.Name       `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main r,omitempty"`
	RunProperties *RunProperties `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main rPr,omitempty"`
	InstrText     string         `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main instrText,omitempty"`
	Text          *Text
	// contains filtered or unexported fields
}

A Run is part of a paragraph that has its own style. It could be a piece of text in bold, or a link

func (*Run) Color

func (r *Run) Color(color string) *Run

Color allows to set run color

func (*Run) Size

func (r *Run) Size(size int) *Run

Size allows to set run size

type RunProperties

type RunProperties struct {
	XMLName  xml.Name      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main rPr,omitempty"`
	Color    *StrValueNode `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main color,omitempty"`
	Size     *IntValueNode `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main sz,omitempty"`
	RunStyle *StrValueNode `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main rStyle,omitempty"`
	Style    *StrValueNode `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main pStyle,omitempty"`
	Fonts    *Fonts        `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main rFonts,omitempty"`
}

RunProperties encapsulates visual properties of a run

type Spacing

type Spacing struct {
	XMLName xml.Name `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main spacing,omitempty"`
	After   int      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main after,attr"`
	Before  int      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main before,attr"`
	Line    int      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main line,attr"`
}

type StrValueNode

type StrValueNode struct {
	XMLName xml.Name
	Val     string `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main val,attr"`
}

func (*StrValueNode) String

func (v *StrValueNode) String() string

type TblCellMar

type TblCellMar struct {
	XMLName xml.Name `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main tblCellMar"`
	Top     int      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main top,attr"`
	Left    int      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main left,attr"`
	Bottom  int      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main bottom,attr"`
	Right   int      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main right,attr"`
}

type TblPr

type TblPr struct {
	XMLName    xml.Name    `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main tblPr"`
	TblStyle   string      `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main tblStyle,attr"`
	TblCellMar *TblCellMar `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main tblCellMar"`
}

type Text

type Text struct {
	XMLName  xml.Name `xml:"http://schemas.openxmlformats.org/wordprocessingml/2006/main t"`
	XMLSpace string   `xml:"xml:space,attr,omitempty"`
	Text     string   `xml:",chardata"`
}

The Text object contains the actual text

Directories

Path Synopsis
cmd
transformer
pdf

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL