latinx

package module

v0.0.0-...-4dfe9ba Latest Latest Go to latest Published: Mar 29, 2012 License: BSD-3-Clause Imports: 4 Imported by: 28

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/bjarneh/latinx

Links

Open Source Insights

README ¶

[ What ]

A small library to encode/decode ISO-8859
byte streams in golang (to/from UTF-8).


[ Install ]

goinstall github.com/bjarneh/latinx


[ Example ]


import "github.com/bjarneh/latinx"

// fetch converter for desired charset
converter := latinx.Get(latinx.ISO_8859_1)

// convert a stream of ISO_8859_1 bytes to UTF-8
utf8bytes,err := converter.Decode(latin1bytes)

// encode a UTF-8 stream as ISO_8859_1
latin1bytes, size, err := converter.Encode(utf8bytes)

if err != nil {
    log.Fatalf("encoded: %d, not: %d, err: %s", size, len(utf8bytes), err)
}

Documentation ¶

Overview ¶

Library to encode/decode ISO-8859 byte streams to/from UTF-8.

This library is complete in terms of the ISO 8859 standard, i.e. all 15 parts are present.

An io.Writer and an io.Reader can be used as well, in order to write or read ISO-8859 streams from underlying io.Reader/io.Writer.

The Windows-1252 conversion is also included, which uses some undefined positions in ISO-8859-1 for common characters.

Index ¶

Constants
func Available() (all []string)
func Decode(charset int, latin []byte) (utf_8 []byte, err error)
func Encode(charset int, utf_8 []byte) (latin []byte, success int, err error)
func NewReader(charset int, r io.Reader) io.Reader
func NewWriter(charset int, w io.Writer) io.Writer
type Converter
- func Get(charset int) *Converter
type LatinReader
- func (r *LatinReader) Read(p []byte) (n int, err error)
type LatinWriter
- func (w *LatinWriter) Write(p []byte) (n int, err error)
type UnicodeError
- func (e UnicodeError) Error() string
type UnknownByteError
- func (e UnknownByteError) Error() string
type UnknownRuneError
- func (e UnknownRuneError) Error() string

Constants ¶

View Source

const (
	ISO_8859_1 = iota
	ISO_8859_2
	ISO_8859_3
	ISO_8859_4
	ISO_8859_5
	ISO_8859_6
	ISO_8859_7
	ISO_8859_8
	ISO_8859_9
	ISO_8859_10
	ISO_8859_11
	ISO_8859_13
	ISO_8859_14
	ISO_8859_15
	ISO_8859_16
	// Extended Latin-1 (Windows only, not a standard)
	Windows1252
	// Common aliases for the standards
	Latin1   = ISO_8859_1
	Latin2   = ISO_8859_2
	Latin3   = ISO_8859_3
	Latin4   = ISO_8859_4
	Cyrillic = ISO_8859_5
	Arabic   = ISO_8859_6
	Greek    = ISO_8859_7
	Hebrew   = ISO_8859_8
	Latin5   = ISO_8859_9
	Latin6   = ISO_8859_10
	Thai     = ISO_8859_11
	Latin7   = ISO_8859_13
	Latin8   = ISO_8859_14
	Latin9   = ISO_8859_15
	Latin10  = ISO_8859_16
	// The numbers (1,2) are just meant to
	// distinguish PARTIAL from ILLEGAL
	PARTIAL = UnicodeError(1)
	ILLEGAL = UnicodeError(2)
)

Constants used to fetch *Converter

Variables ¶

This section is empty.

Functions ¶

func Available ¶

func Available() (all []string)

Return the String representation of all available encodings

func Decode ¶

func Decode(charset int, latin []byte) (utf_8 []byte, err error)

Convert a ISO-8859 encoded slice to a UTF-8 encoded slice

func Encode ¶

func Encode(charset int, utf_8 []byte) (latin []byte, success int, err error)

Convert a UTF-8 encoded slice to a ISO-8859 encoded slice

func NewReader ¶

func NewReader(charset int, r io.Reader) io.Reader

Initialize a new LatinReader using an underlying io.Reader and one of the available charsets (ISO_8895_1, ISO_8895_2..)

func NewWriter ¶

func NewWriter(charset int, w io.Writer) io.Writer

Initialize a new LatinWriter with an underlying io.Writer and one of the available charsets (ISO_8895_1,ISO_8895_2...).

Types ¶

type Converter ¶

type Converter struct {
	// contains filtered or unexported fields
}

A Converter holds mappings from ISO 8859 => UTF-8, and vice verca.

func Get ¶

func Get(charset int) *Converter

Return *Converter || nil for unknown charset

func (*Converter) Decode ¶

func (c *Converter) Decode(latin []byte) (utf_8 []byte, err error)

Convert a ISO 8859 byte sequence into a UTF-8 byte sequence. If this function returns a UnknownByteError, the charset of the Converter does not have a unicode mapping for a byte found in latin.

func (*Converter) Encode ¶

func (c *Converter) Encode(utf_8 []byte) (latin []byte, success int, err error)

Convert a UTF-8 byte sequence into a ISO 8859 byte sequence. The errors returned by this function are either UnicodeError, which means that a partial UTF-8 symbol or an illegal UTF-8 sequence was found, i.e. either latinx.ILLEGAL, or latinx.PARTIAL. When a UnicodeError is returned, success < len(utf_8), and success indicates how many bytes that was successfully converted into UTF-8 bytes. If this function returns an UnknownRuneError, it means that the charset of the Converter has no mapping for a rune (UTF-8 letter) found in the utf_8 array.

func (*Converter) String ¶

func (c *Converter) String() string

type LatinReader ¶

type LatinReader struct {
	// contains filtered or unexported fields
}

A LatinReader reads ISO-8859 streams from underlying reader, decodes them to UTF-8, and writes them to a *bytes.Buffer, which is used to store the possibly larger byte-stream. After the decoded stream has been written to buffer, a Read(p []byte) from *bytes.Buffer is preformed.

func (*LatinReader) Read ¶

func (r *LatinReader) Read(p []byte) (n int, err error)

Read from underlying io.Reader and decode to UTF-8 and return result.

type LatinWriter ¶

type LatinWriter struct {
	// contains filtered or unexported fields
}

A LatinWriter writer will encode UTF-8 byte-streams into selected ISO 8859 byte-stream, before writing them to underlying io.Writer.

func (*LatinWriter) Write ¶

func (w *LatinWriter) Write(p []byte) (n int, err error)

The returned n represents how much of the input we where able to write, this may be different than the actual number of bytes written since Converter.Encode converts multibyte UTF-8 into singlebyte ISO 8859, i.e. if you write []byte("€€€") using charset ISO_8859_15, it will return 9, but it actually just wrote 3 bytes to underlying io.Writer.

type UnicodeError ¶

type UnicodeError int

Error type for Partial UTF-8 sequences

func (UnicodeError) Error ¶

func (e UnicodeError) Error() string

type UnknownByteError ¶

type UnknownByteError string

Error type for unknown ISO 8859 byte

func (UnknownByteError) Error ¶

func (e UnknownByteError) Error() string

type UnknownRuneError ¶

type UnknownRuneError string

Error type for unknown UTF-8 runes

func (UnknownRuneError) Error ¶

func (e UnknownRuneError) Error() string

Directories ¶

Path	Synopsis
example
giconv

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL