Documentation ¶
Overview ¶
Package textencoding is used for handling text encoding (char code <-> glyph mapping) in unidoc both for reading and outputing PDF contents.
Index ¶
- Constants
- func FromFontDifferences(diffList *core.PdfObjectArray) (map[CharCode]GlyphName, error)
- func GlyphToRune(glyph GlyphName) (rune, bool)
- func RegisterSimpleEncoding(name string, fnc func() SimpleEncoder)
- func RuneToString(r rune) string
- type CMapEncoder
- func (enc CMapEncoder) CharcodeToRune(code CharCode) (rune, bool)
- func (enc CMapEncoder) Decode(raw []byte) string
- func (enc CMapEncoder) Encode(str string) []byte
- func (enc CMapEncoder) RuneToCharcode(r rune) (CharCode, bool)
- func (enc CMapEncoder) String() string
- func (enc CMapEncoder) ToPdfObject() core.PdfObject
- type CharCode
- type GID
- type GlyphName
- type IdentityEncoder
- func (enc IdentityEncoder) CharcodeToRune(code CharCode) (rune, bool)
- func (enc IdentityEncoder) Decode(raw []byte) string
- func (enc IdentityEncoder) Encode(str string) []byte
- func (enc IdentityEncoder) GlyphToRune(glyph GlyphName) (rune, bool)
- func (enc IdentityEncoder) RuneToCharcode(r rune) (CharCode, bool)
- func (enc IdentityEncoder) RuneToGlyph(r rune) (GlyphName, bool)
- func (enc IdentityEncoder) String() string
- func (enc IdentityEncoder) ToPdfObject() core.PdfObject
- type SimpleEncoder
- func ApplyDifferences(base SimpleEncoder, differences map[CharCode]GlyphName) SimpleEncoder
- func NewCustomSimpleTextEncoder(encoding, differences map[CharCode]GlyphName) (SimpleEncoder, error)
- func NewMacExpertEncoder() SimpleEncoder
- func NewMacRomanEncoder() SimpleEncoder
- func NewPdfDocEncoder() SimpleEncoder
- func NewSimpleTextEncoder(baseName string, differences map[CharCode]GlyphName) (SimpleEncoder, error)
- func NewStandardEncoder() SimpleEncoder
- func NewSymbolEncoder() SimpleEncoder
- func NewWinAnsiEncoder() SimpleEncoder
- func NewZapfDingbatsEncoder() SimpleEncoder
- type TextEncoder
- type TrueTypeFontEncoder
- func (enc TrueTypeFontEncoder) CharcodeToRune(code CharCode) (rune, bool)
- func (enc TrueTypeFontEncoder) Decode(raw []byte) string
- func (enc TrueTypeFontEncoder) Encode(str string) []byte
- func (enc TrueTypeFontEncoder) GlyphToCharcode(glyph GlyphName) (CharCode, bool)
- func (enc TrueTypeFontEncoder) RuneToCharcode(r rune) (CharCode, bool)
- func (enc TrueTypeFontEncoder) String() string
- func (enc TrueTypeFontEncoder) ToPdfObject() core.PdfObject
- type UTF16Encoder
- func (enc UTF16Encoder) CharcodeToRune(code CharCode) (rune, bool)
- func (enc UTF16Encoder) Decode(raw []byte) string
- func (enc UTF16Encoder) Encode(str string) []byte
- func (enc UTF16Encoder) RuneToCharcode(r rune) (CharCode, bool)
- func (enc UTF16Encoder) String() string
- func (enc UTF16Encoder) ToPdfObject() core.PdfObject
Constants ¶
const MissingCodeRune = '\ufffd' // �
MissingCodeRune is the rune returned when there is no matching glyph. It was previously '?'.
Variables ¶
This section is empty.
Functions ¶
func FromFontDifferences ¶
func FromFontDifferences(diffList *core.PdfObjectArray) (map[CharCode]GlyphName, error)
FromFontDifferences converts `diffList` (a /Differences array from an /Encoding object) to a map representing character code to glyph mappings.
func GlyphToRune ¶
GlyphToRune returns the rune corresponding to glyph `glyph` if there is one. TODO: Can we return a string here? e.g. When we are extracting text, we want to get "ffi"
rather than 'ffi'. We only need a glyph ➞ rune map when we need to convert back to glyphs. We are currently applying RuneToString to the output of functions that call GlyphToRune. While this gives the same result, it makes the calling code complex and fragile.
TODO: Can we combine all the tables glyphAliases, glyphlistGlyphToRuneMap,
texGlyphlistGlyphToStringMap, additionalGlyphlistGlyphToRuneMap and ".notdef"?
func RegisterSimpleEncoding ¶
func RegisterSimpleEncoding(name string, fnc func() SimpleEncoder)
RegisterSimpleEncoding registers a SimpleEncoder constructer by PDF encoding name.
func RuneToString ¶
RuneToString converts rune `r` to a string. It unpacks `ligatures`.
Types ¶
type CMapEncoder ¶
type CMapEncoder struct {
// contains filtered or unexported fields
}
CMapEncoder encodes/decodes strings based on CMap mappings.
func NewCMapEncoder ¶
func NewCMapEncoder(baseName string, codeToCID, cidToUnicode *cmap.CMap) CMapEncoder
NewCMapEncoder returns a new CMapEncoder based on the predefined encoding `baseName`. If `codeToCID` is nil, Identity encoding is assumed. `cidToUnicode` must not be nil.
func (CMapEncoder) CharcodeToRune ¶
func (enc CMapEncoder) CharcodeToRune(code CharCode) (rune, bool)
CharcodeToRune converts PDF character code `code` to a rune. The bool return flag is true if there was a match, and false otherwise.
func (CMapEncoder) Decode ¶
func (enc CMapEncoder) Decode(raw []byte) string
Decode converts PDF encoded string to a Go unicode string.
func (CMapEncoder) Encode ¶
func (enc CMapEncoder) Encode(str string) []byte
Encode converts the Go unicode string to a PDF encoded string.
func (CMapEncoder) RuneToCharcode ¶
func (enc CMapEncoder) RuneToCharcode(r rune) (CharCode, bool)
RuneToCharcode converts rune `r` to a PDF character code. The bool return flag is true if there was a match, and false otherwise.
func (CMapEncoder) String ¶
func (enc CMapEncoder) String() string
String returns a string that describes `enc`.
func (CMapEncoder) ToPdfObject ¶
func (enc CMapEncoder) ToPdfObject() core.PdfObject
ToPdfObject returns a PDF Object that represents the encoding.
type GlyphName ¶
type GlyphName string
GlyphName is a name of a glyph.
func RuneToGlyph ¶
RuneToGlyph is the reverse of the table lookups in GlyphToRune.
type IdentityEncoder ¶
type IdentityEncoder struct {
// contains filtered or unexported fields
}
IdentityEncoder represents an 2-byte identity encoding
func NewIdentityTextEncoder ¶
func NewIdentityTextEncoder(baseName string) IdentityEncoder
NewIdentityTextEncoder returns a new IdentityEncoder based on predefined encoding `baseName` and difference map `differences`.
func (IdentityEncoder) CharcodeToRune ¶
func (enc IdentityEncoder) CharcodeToRune(code CharCode) (rune, bool)
CharcodeToRune converts PDF character code `code` to a rune. The bool return flag is true if there was a match, and false otherwise.
func (IdentityEncoder) Decode ¶
func (enc IdentityEncoder) Decode(raw []byte) string
Decode converts PDF encoded string to a Go unicode string.
func (IdentityEncoder) Encode ¶
func (enc IdentityEncoder) Encode(str string) []byte
Encode converts the Go unicode string to a PDF encoded string.
func (IdentityEncoder) GlyphToRune ¶
func (enc IdentityEncoder) GlyphToRune(glyph GlyphName) (rune, bool)
GlyphToRune returns the rune corresponding to glyph name `glyph`. The bool return flag is true if there was a match, and false otherwise.
func (IdentityEncoder) RuneToCharcode ¶
func (enc IdentityEncoder) RuneToCharcode(r rune) (CharCode, bool)
RuneToCharcode converts rune `r` to a PDF character code. The bool return flag is true if there was a match, and false otherwise.
func (IdentityEncoder) RuneToGlyph ¶
func (enc IdentityEncoder) RuneToGlyph(r rune) (GlyphName, bool)
RuneToGlyph returns the glyph name for rune `r`. The bool return flag is true if there was a match, and false otherwise.
func (IdentityEncoder) String ¶
func (enc IdentityEncoder) String() string
String returns a string that describes `enc`.
func (IdentityEncoder) ToPdfObject ¶
func (enc IdentityEncoder) ToPdfObject() core.PdfObject
ToPdfObject returns a nil as it is not truly a PDF object and should not be attempted to store in file.
type SimpleEncoder ¶
type SimpleEncoder interface { TextEncoder BaseName() string Charcodes() []CharCode }
SimpleEncoder represents a 1 byte encoding.
func ApplyDifferences ¶
func ApplyDifferences(base SimpleEncoder, differences map[CharCode]GlyphName) SimpleEncoder
ApplyDifferences modifies or wraps the base encoding and overlays differences over it.
func NewCustomSimpleTextEncoder ¶
func NewCustomSimpleTextEncoder(encoding, differences map[CharCode]GlyphName) (SimpleEncoder, error)
NewCustomSimpleTextEncoder returns a simpleEncoder based on map `encoding` and difference map `differences`.
func NewMacExpertEncoder ¶
func NewMacExpertEncoder() SimpleEncoder
NewMacExpertEncoder returns a SimpleEncoder that implements MacExpertEncoding.
func NewMacRomanEncoder ¶
func NewMacRomanEncoder() SimpleEncoder
NewMacRomanEncoder returns a SimpleEncoder that implements MacRomanEncoding.
func NewPdfDocEncoder ¶
func NewPdfDocEncoder() SimpleEncoder
NewPdfDocEncoder returns a SimpleEncoder that implements PdfDocEncoding.
func NewSimpleTextEncoder ¶
func NewSimpleTextEncoder(baseName string, differences map[CharCode]GlyphName) (SimpleEncoder, error)
NewSimpleTextEncoder returns a simpleEncoder based on predefined encoding `baseName` and difference map `differences`.
func NewStandardEncoder ¶
func NewStandardEncoder() SimpleEncoder
NewStandardEncoder returns a SimpleEncoder that implements StandardEncoding.
func NewSymbolEncoder ¶
func NewSymbolEncoder() SimpleEncoder
NewSymbolEncoder returns a SimpleEncoder that implements SymbolEncoding.
func NewWinAnsiEncoder ¶
func NewWinAnsiEncoder() SimpleEncoder
NewWinAnsiEncoder returns a simpleEncoder that implements WinAnsiEncoding.
func NewZapfDingbatsEncoder ¶
func NewZapfDingbatsEncoder() SimpleEncoder
NewZapfDingbatsEncoder returns a SimpleEncoder that implements ZapfDingbatsEncoding.
type TextEncoder ¶
type TextEncoder interface { // String returns a string that describes the TextEncoder instance. String() string // Encode converts the Go unicode string to a PDF encoded string. Encode(str string) []byte // Decode converts PDF encoded string to a Go unicode string. Decode(raw []byte) string // RuneToCharcode returns the PDF character code corresponding to rune `r`. // The bool return flag is true if there was a match, and false otherwise. // This is usually implemented as RuneToGlyph->GlyphToCharcode RuneToCharcode(r rune) (CharCode, bool) // CharcodeToRune returns the rune corresponding to character code `code`. // The bool return flag is true if there was a match, and false otherwise. // This is usually implemented as CharcodeToGlyph->GlyphToRune CharcodeToRune(code CharCode) (rune, bool) // ToPdfObject returns a PDF Object that represents the encoding. ToPdfObject() core.PdfObject }
TextEncoder defines the common methods that a text encoder implementation must have in UniDoc.
type TrueTypeFontEncoder ¶
type TrueTypeFontEncoder struct {
// contains filtered or unexported fields
}
TrueTypeFontEncoder handles text encoding for composite TrueType fonts. It performs mapping between character ids and glyph ids. It has a preloaded rune (unicode code point) to glyph index map that has been loaded from a font. Corresponds to Identity-H CMap and Identity encoding.
func NewTrueTypeFontEncoder ¶
func NewTrueTypeFontEncoder(runeToGIDMap map[rune]GID) TrueTypeFontEncoder
NewTrueTypeFontEncoder creates a new text encoder for TTF fonts with a runeToGlyphIndexMap that has been preloaded from the font file. The new instance is preloaded with a CMapIdentityH (Identity-H) CMap which maps 2-byte charcodes to CIDs (glyph index).
func (TrueTypeFontEncoder) CharcodeToRune ¶
func (enc TrueTypeFontEncoder) CharcodeToRune(code CharCode) (rune, bool)
CharcodeToRune converts PDF character code `code` to a rune. The bool return flag is true if there was a match, and false otherwise.
func (TrueTypeFontEncoder) Decode ¶
func (enc TrueTypeFontEncoder) Decode(raw []byte) string
Decode converts PDF encoded string to a Go unicode string.
func (TrueTypeFontEncoder) Encode ¶
func (enc TrueTypeFontEncoder) Encode(str string) []byte
Encode converts the Go unicode string to a PDF encoded string.
func (TrueTypeFontEncoder) GlyphToCharcode ¶
func (enc TrueTypeFontEncoder) GlyphToCharcode(glyph GlyphName) (CharCode, bool)
GlyphToCharcode returns character code matching the glyph name `glyph`. The bool return flag is true if there was a match, and false otherwise.
func (TrueTypeFontEncoder) RuneToCharcode ¶
func (enc TrueTypeFontEncoder) RuneToCharcode(r rune) (CharCode, bool)
RuneToCharcode converts rune `r` to a PDF character code. The bool return flag is true if there was a match, and false otherwise.
func (TrueTypeFontEncoder) String ¶
func (enc TrueTypeFontEncoder) String() string
String returns a string that describes `enc`.
func (TrueTypeFontEncoder) ToPdfObject ¶
func (enc TrueTypeFontEncoder) ToPdfObject() core.PdfObject
ToPdfObject returns a nil as it is not truly a PDF object and should not be attempted to store in file.
type UTF16Encoder ¶
type UTF16Encoder struct {
// contains filtered or unexported fields
}
UTF16Encoder represents UTF-16 encoding.
func NewUTF16TextEncoder ¶
func NewUTF16TextEncoder(baseName string) UTF16Encoder
NewUTF16TextEncoder returns a new UTF16Encoder based on the predefined encoding `baseName`.
func (UTF16Encoder) CharcodeToRune ¶
func (enc UTF16Encoder) CharcodeToRune(code CharCode) (rune, bool)
CharcodeToRune converts PDF character code `code` to a rune. The bool return flag is true if there was a match, and false otherwise.
func (UTF16Encoder) Decode ¶
func (enc UTF16Encoder) Decode(raw []byte) string
Decode converts PDF encoded string to a Go unicode string.
func (UTF16Encoder) Encode ¶
func (enc UTF16Encoder) Encode(str string) []byte
Encode converts the Go unicode string to a PDF encoded string.
func (UTF16Encoder) RuneToCharcode ¶
func (enc UTF16Encoder) RuneToCharcode(r rune) (CharCode, bool)
RuneToCharcode converts rune `r` to a PDF character code. The bool return flag is true if there was a match, and false otherwise.
func (UTF16Encoder) String ¶
func (enc UTF16Encoder) String() string
String returns a string that describes `enc`.
func (UTF16Encoder) ToPdfObject ¶
func (enc UTF16Encoder) ToPdfObject() core.PdfObject
ToPdfObject returns a PDF Object that represents the encoding.