cmap

package
v0.0.0-...-a2e00f7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 21, 2024 License: MIT Imports: 13 Imported by: 0

Documentation

Index

Constants

View Source
const (

	// MissingCodeRune replaces runes that can't be decoded. '\ufffd' = �. Was '?'.
	MissingCodeRune = '\ufffd' // �

	// MissingCodeString replaces strings that can't be decoded.
	MissingCodeString = string(MissingCodeRune)
)

Variables

View Source
var (
	ErrBadCMap        = errors.New("bad cmap")
	ErrBadCMapComment = errors.New("comment should start with %")
	ErrBadCMapDict    = errors.New("invalid dict")
)

CMap parser errors.

Functions

func IsPredefinedCMap

func IsPredefinedCMap(name string) bool

IsPredefinedCMap returns true if the specified CMap name is a predefined CJK CMap. The predefined CMaps are bundled with the package and can be loaded using the LoadPredefinedCMap function. See section 9.7.5.2 "Predefined CMaps" (page 273, Table 118).

Types

type CIDSystemInfo

type CIDSystemInfo struct {
	Registry   string
	Ordering   string
	Supplement int
}

CIDSystemInfo contains information for identifying the character collection used by a CID font. CIDSystemInfo=Dict("Registry": Adobe, "Ordering": Korea1, "Supplement": 0, )

func NewCIDSystemInfo

func NewCIDSystemInfo(obj core.PdfObject) (info CIDSystemInfo, err error)

NewCIDSystemInfo returns the CIDSystemInfo encoded in PDFObject `obj`.

func (*CIDSystemInfo) String

func (info *CIDSystemInfo) String() string

String returns a human readable description of `info`. It looks like "Adobe-Japan2-000".

type CMap

type CMap struct {
	// contains filtered or unexported fields
}

CMap represents a character code to unicode mapping used in PDF files. References:

https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/5411.ToUnicode.pdf
https://github.com/adobe-type-tools/cmap-resources/releases

func LoadCmapFromData

func LoadCmapFromData(data []byte, isSimple bool) (*CMap, error)

LoadCmapFromData parses the in-memory cmap `data` and returns the resulting CMap. If `isSimple` is true, it uses 1-byte encodings, otherwise it uses the codespaces in the cmap.

9.10.3 ToUnicode CMaps (page 293).

func LoadCmapFromDataCID

func LoadCmapFromDataCID(data []byte) (*CMap, error)

LoadCmapFromDataCID parses the in-memory cmap `data` and returns the resulting CMap. It is a convenience function.

func LoadPredefinedCMap

func LoadPredefinedCMap(name string) (*CMap, error)

LoadPredefinedCMap loads a predefined CJK CMap by name. See section 9.7.5.2 "Predefined CMaps" (page 273, Table 118).

func NewToUnicodeCMap

func NewToUnicodeCMap(codeToRune map[CharCode]rune) *CMap

NewToUnicodeCMap returns an identity CMap with codeToUnicode matching the `codeToRune` arg.

func (*CMap) Bytes

func (cmap *CMap) Bytes() []byte

Bytes returns the raw bytes of a PDF CMap corresponding to `cmap`.

func (*CMap) BytesToCharcodes

func (cmap *CMap) BytesToCharcodes(data []byte) ([]CharCode, bool)

BytesToCharcodes attempts to convert the entire byte array `data` to a list of character codes from the ranges specified by `cmap`'s codespaces. Returns:

character code sequence (if there is a match complete match)
matched?

NOTE: A partial list of character codes will be returned if a complete match

is not possible.

func (*CMap) CIDToCharcode

func (cmap *CMap) CIDToCharcode(cid CharCode) (CharCode, bool)

CIDToCharcode maps the specified character identified to a character code. If the provided CID has no available mapping, the second return value is false.

func (*CMap) CharcodeBytesToUnicode

func (cmap *CMap) CharcodeBytesToUnicode(data []byte) (string, int)

CharcodeBytesToUnicode converts a byte array of charcodes to a unicode string representation. It also returns a bool flag to tell if the conversion was successful. NOTE: This only works for ToUnicode cmaps.

func (*CMap) CharcodeToCID

func (cmap *CMap) CharcodeToCID(code CharCode) (CharCode, bool)

CharcodeToCID maps the specified character code to a character identifier. If the provided charcode has no available mapping, the second return value is false. The returned CID can be mapped to a Unicode character using a Unicode conversion CMap.

func (*CMap) CharcodeToUnicode

func (cmap *CMap) CharcodeToUnicode(code CharCode) (string, bool)

CharcodeToUnicode converts a single character code `code` to a unicode string. If `code` is not in the unicode map, '�' is returned. NOTE: CharcodeBytesToUnicode is typically more efficient.

func (*CMap) NBits

func (cmap *CMap) NBits() int

NBits returns 8 bits for simple font CMaps and 16 bits for CID font CMaps.

func (*CMap) Name

func (cmap *CMap) Name() string

Name returns the name of the CMap.

func (*CMap) Stream

func (cmap *CMap) Stream() (*core.PdfObjectStream, error)

Stream returns a Flate encoded stream containing the raw CMap data.

func (*CMap) String

func (cmap *CMap) String() string

String returns a human readable description of `cmap`.

func (*CMap) StringToCID

func (cmap *CMap) StringToCID(s string) (CharCode, bool)

StringToCID maps the specified string to a character identifier. If the provided string has no available mapping, the bool return value is false.

func (*CMap) Type

func (cmap *CMap) Type() int

Type returns the CMap type.

type CharCode

type CharCode uint32

CharCode is a character code or Unicode rune is int32 https://golang.org/doc/go1#rune

type Codespace

type Codespace struct {
	NumBytes int
	Low      CharCode
	High     CharCode
}

Codespace represents a single codespace range used in the CMap.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL