latinx

package module
v0.0.0-...-4dfe9ba Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 29, 2012 License: BSD-3-Clause Imports: 4 Imported by: 28

README

[ What ]

A small library to encode/decode ISO-8859
byte streams in golang (to/from UTF-8).


[ Install ]

goinstall github.com/bjarneh/latinx


[ Example ]


import "github.com/bjarneh/latinx"

// fetch converter for desired charset
converter := latinx.Get(latinx.ISO_8859_1)

// convert a stream of ISO_8859_1 bytes to UTF-8
utf8bytes,err := converter.Decode(latin1bytes)

// encode a UTF-8 stream as ISO_8859_1
latin1bytes, size, err := converter.Encode(utf8bytes)

if err != nil {
    log.Fatalf("encoded: %d, not: %d, err: %s", size, len(utf8bytes), err)
}

Documentation

Overview

Library to encode/decode ISO-8859 byte streams to/from UTF-8.

This library is complete in terms of the ISO 8859 standard, i.e. all 15 parts are present.

An io.Writer and an io.Reader can be used as well, in order to write or read ISO-8859 streams from underlying io.Reader/io.Writer.

The Windows-1252 conversion is also included, which uses some undefined positions in ISO-8859-1 for common characters.

Index

Constants

View Source
const (
	ISO_8859_1 = iota
	ISO_8859_2
	ISO_8859_3
	ISO_8859_4
	ISO_8859_5
	ISO_8859_6
	ISO_8859_7
	ISO_8859_8
	ISO_8859_9
	ISO_8859_10
	ISO_8859_11
	ISO_8859_13
	ISO_8859_14
	ISO_8859_15
	ISO_8859_16
	// Extended Latin-1 (Windows only, not a standard)
	Windows1252
	// Common aliases for the standards
	Latin1   = ISO_8859_1
	Latin2   = ISO_8859_2
	Latin3   = ISO_8859_3
	Latin4   = ISO_8859_4
	Cyrillic = ISO_8859_5
	Arabic   = ISO_8859_6
	Greek    = ISO_8859_7
	Hebrew   = ISO_8859_8
	Latin5   = ISO_8859_9
	Latin6   = ISO_8859_10
	Thai     = ISO_8859_11
	Latin7   = ISO_8859_13
	Latin8   = ISO_8859_14
	Latin9   = ISO_8859_15
	Latin10  = ISO_8859_16
	// The numbers (1,2) are just meant to
	// distinguish PARTIAL from ILLEGAL
	PARTIAL = UnicodeError(1)
	ILLEGAL = UnicodeError(2)
)

Constants used to fetch *Converter

Variables

This section is empty.

Functions

func Available

func Available() (all []string)

Return the String representation of all available encodings

func Decode

func Decode(charset int, latin []byte) (utf_8 []byte, err error)

Convert a ISO-8859 encoded slice to a UTF-8 encoded slice

func Encode

func Encode(charset int, utf_8 []byte) (latin []byte, success int, err error)

Convert a UTF-8 encoded slice to a ISO-8859 encoded slice

func NewReader

func NewReader(charset int, r io.Reader) io.Reader

Initialize a new LatinReader using an underlying io.Reader and one of the available charsets (ISO_8895_1, ISO_8895_2..)

func NewWriter

func NewWriter(charset int, w io.Writer) io.Writer

Initialize a new LatinWriter with an underlying io.Writer and one of the available charsets (ISO_8895_1,ISO_8895_2...).

Types

type Converter

type Converter struct {
	// contains filtered or unexported fields
}

A Converter holds mappings from ISO 8859 => UTF-8, and vice verca.

func Get

func Get(charset int) *Converter

Return *Converter || nil for unknown charset

func (*Converter) Decode

func (c *Converter) Decode(latin []byte) (utf_8 []byte, err error)

Convert a ISO 8859 byte sequence into a UTF-8 byte sequence. If this function returns a UnknownByteError, the charset of the Converter does not have a unicode mapping for a byte found in latin.

func (*Converter) Encode

func (c *Converter) Encode(utf_8 []byte) (latin []byte, success int, err error)

Convert a UTF-8 byte sequence into a ISO 8859 byte sequence. The errors returned by this function are either UnicodeError, which means that a partial UTF-8 symbol or an illegal UTF-8 sequence was found, i.e. either latinx.ILLEGAL, or latinx.PARTIAL. When a UnicodeError is returned, success < len(utf_8), and success indicates how many bytes that was successfully converted into UTF-8 bytes. If this function returns an UnknownRuneError, it means that the charset of the Converter has no mapping for a rune (UTF-8 letter) found in the utf_8 array.

func (*Converter) String

func (c *Converter) String() string

type LatinReader

type LatinReader struct {
	// contains filtered or unexported fields
}

A LatinReader reads ISO-8859 streams from underlying reader, decodes them to UTF-8, and writes them to a *bytes.Buffer, which is used to store the possibly larger byte-stream. After the decoded stream has been written to buffer, a Read(p []byte) from *bytes.Buffer is preformed.

func (*LatinReader) Read

func (r *LatinReader) Read(p []byte) (n int, err error)

Read from underlying io.Reader and decode to UTF-8 and return result.

type LatinWriter

type LatinWriter struct {
	// contains filtered or unexported fields
}

A LatinWriter writer will encode UTF-8 byte-streams into selected ISO 8859 byte-stream, before writing them to underlying io.Writer.

func (*LatinWriter) Write

func (w *LatinWriter) Write(p []byte) (n int, err error)

The returned n represents how much of the input we where able to write, this may be different than the actual number of bytes written since Converter.Encode converts multibyte UTF-8 into singlebyte ISO 8859, i.e. if you write []byte("€€€") using charset ISO_8859_15, it will return 9, but it actually just wrote 3 bytes to underlying io.Writer.

type UnicodeError

type UnicodeError int

Error type for Partial UTF-8 sequences

func (UnicodeError) Error

func (e UnicodeError) Error() string

type UnknownByteError

type UnknownByteError string

Error type for unknown ISO 8859 byte

func (UnknownByteError) Error

func (e UnknownByteError) Error() string

type UnknownRuneError

type UnknownRuneError string

Error type for unknown UTF-8 runes

func (UnknownRuneError) Error

func (e UnknownRuneError) Error() string

Directories

Path Synopsis
example

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL