cmaps

package
v0.0.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 28, 2023 License: MIT Imports: 10 Imported by: 1

Documentation

Overview

Implements a CMap parser (both for ToUnicode and CID CMaps)

Index

Constants

View Source
const (

	// MissingCodeRune replaces runes that can't be decoded. '\ufffd' = �. Was '?'.
	MissingCodeRune = '\ufffd' // �
)

Variables

View Source
var ErrBadCMap = errors.New("bad cmap")

CMap parser errors.

Functions

func WriteAdobeIdentityUnicodeCMap

func WriteAdobeIdentityUnicodeCMap(dict map[uint32][]rune) []byte

WriteAdobeIdentityUnicodeCMap dumps the given mapping to a Cmap ressource, ready to be embedded in a PDF file.

Types

type CIDRange

type CIDRange struct {
	Codespace
	CIDStart model.CID // CID code for the first character code in range
}

CIDRange is an increasing number of CIDs, associated from Low to High.

type CMap

type CMap struct {
	Name          model.ObjName
	UseCMap       model.ObjName
	CIDSystemInfo model.CIDSystemInfo
	Codespaces    []Codespace
	CIDs          []CIDRange
	Type          int
	// contains filtered or unexported fields
}

CMap map character code to CIDs. It is either predefined, or embedded in PDF as a stream.

func ParseCIDCMap

func ParseCIDCMap(data []byte) (CMap, error)

ParseCIDCMap parses the in-memory cmap `data` and returns the resulting CMap. See 9.7.5.3 Embedded CMap Files

func (*CMap) BytesToCharcodes

func (cmap *CMap) BytesToCharcodes(data []byte) ([]CharCode, bool)

BytesToCharcodes attempts to convert the entire byte array `data` to a list of character codes from the ranges specified by `cmap`'s codespaces. Returns:

character code sequence (if there is a match complete match)
matched?

NOTE: A partial list of character codes will be returned if a complete match

is not possible.

func (CMap) CharCodeToCID

func (cm CMap) CharCodeToCID() map[CharCode]model.CID

CharCodeToCID accumulate all the CID ranges into one map

func (*CMap) Simple

func (cm *CMap) Simple() bool

Simple returns `true` if only one-byte character code are encoded It is cached for performance reasons, so `Codespaces` shoudn't be mutated after the call.

type CharCode

type CharCode int32

CharCode is a compact representation of 1 to 4 bytes, as found in PDF content streams.

func (CharCode) Append

func (c CharCode) Append(bs *[]byte)

Append add 1 to 4 bytes to `bs`, in Big-Endian order.

type Codespace

type Codespace struct {
	NumBytes  int      // how many bytes should be read to match this code (between 1 and 4)
	Low, High CharCode // compact version of [4]byte
}

Codespace represents a single codespace range used in the CMap.

type ToUnicode

type ToUnicode interface {
	MergeTo(accu map[model.CID][]rune)
}

type ToUnicodeArray

type ToUnicodeArray struct {
	Runes    [][]rune // length To - From + 1
	From, To model.CID
}

ToUnicodeArray is a compact mapping of [From, To] to Runes

func (ToUnicodeArray) MergeTo

func (arr ToUnicodeArray) MergeTo(simple map[model.CID][]rune)

type ToUnicodePair

type ToUnicodePair struct {
	From model.CID
	Dest []rune
}

func (ToUnicodePair) MergeTo

func (p ToUnicodePair) MergeTo(simple map[model.CID][]rune)

type ToUnicodeTranslation

type ToUnicodeTranslation struct {
	From, To model.CID
	Dest     rune
}

ToUnicodeTranslation is a compact mapping of [From,To] to [Dest,Dest+To-From]. It can also represent a simple mapping by taking From = To

func (ToUnicodeTranslation) MergeTo

func (tr ToUnicodeTranslation) MergeTo(simple map[model.CID][]rune)

type UnicodeCMap

type UnicodeCMap struct {
	UseCMap model.ObjName // base this cmap on `UseCMap` if `UseCMap` is not empty.

	Mappings []ToUnicode // compact representation
}

UnicodeCMap maps from CID to Unicode points. Note that it differs from CID Cmap in the sense that the origin of the mapping are CID and not CharCode.

func ParseUnicodeCMap

func ParseUnicodeCMap(data []byte) (UnicodeCMap, error)

ParseUnicodeCMap parses the cmap `data` and returns the resulting CMap. See 9.10.3 ToUnicode CMaps

func (UnicodeCMap) ProperLookupTable

func (u UnicodeCMap) ProperLookupTable() map[model.CID][]rune

ProperLookupTable returns a convenient form of the mapping, without resolving a potential UseCMap.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL