encoding

package

v0.3.5 Latest Latest Go to latest Published: Dec 8, 2020 License: BSD-3-Clause Imports: 6 Imported by: 4,285

Documentation ¶

Overview ¶

Package encoding defines an interface for character encodings, such as Shift JIS and Windows 1252, that can convert to and from UTF-8.

Encoding implementations are provided in other packages, such as golang.org/x/text/encoding/charmap and golang.org/x/text/encoding/japanese.

Index ¶

Constants
Variables
type Decoder
type Encoder
- func HTMLEscapeUnsupported(e *Encoder) *Encoder
- func ReplaceUnsupported(e *Encoder) *Encoder
type Encoding

Constants ¶

View Source

const ASCIISub = '\x1a'

ASCIISub is the ASCII substitute character, as recommended by https://unicode.org/reports/tr36/#Text_Comparison

Variables ¶

View Source

var ErrInvalidUTF8 = errors.New("encoding: invalid UTF-8")

ErrInvalidUTF8 means that a transformer encountered invalid UTF-8.

View Source

var UTF8Validator transform.Transformer = utf8Validator{}

UTF8Validator is a transformer that returns ErrInvalidUTF8 on the first input byte that is not valid UTF-8.

Functions ¶

This section is empty.

Types ¶

type Decoder ¶

type Decoder struct {
	transform.Transformer
	// contains filtered or unexported fields
}

A Decoder converts bytes to UTF-8. It implements transform.Transformer.

Transforming source bytes that are not of that encoding will not result in an error per se. Each byte that cannot be transcoded will be represented in the output by the UTF-8 encoding of '\uFFFD', the replacement rune.

func (*Decoder) Bytes ¶

func (d *Decoder) Bytes(b []byte) ([]byte, error)

Bytes converts the given encoded bytes to UTF-8. It returns the converted bytes or nil, err if any error occurred.

func (*Decoder) Reader ¶

func (d *Decoder) Reader(r io.Reader) io.Reader

Reader wraps another Reader to decode its bytes.

The Decoder may not be used for any other operation as long as the returned Reader is in use.

func (*Decoder) String ¶

func (d *Decoder) String(s string) (string, error)

String converts the given encoded string to UTF-8. It returns the converted string or "", err if any error occurred.

type Encoder ¶

type Encoder struct {
	transform.Transformer
	// contains filtered or unexported fields
}

An Encoder converts bytes from UTF-8. It implements transform.Transformer.

Each rune that cannot be transcoded will result in an error. In this case, the transform will consume all source byte up to, not including the offending rune. Transforming source bytes that are not valid UTF-8 will be replaced by `\uFFFD`. To return early with an error instead, use transform.Chain to preprocess the data with a UTF8Validator.

func HTMLEscapeUnsupported ¶

func HTMLEscapeUnsupported(e *Encoder) *Encoder

HTMLEscapeUnsupported wraps encoders to replace source runes outside the repertoire of the destination encoding with HTML escape sequences.

This wrapper exists to comply to URL and HTML forms requiring a non-terminating legacy encoder. The produced sequences may lead to data loss as they are indistinguishable from legitimate input. To avoid this issue, use UTF-8 encodings whenever possible.

func ReplaceUnsupported ¶

func ReplaceUnsupported(e *Encoder) *Encoder

ReplaceUnsupported wraps encoders to replace source runes outside the repertoire of the destination encoding with an encoding-specific replacement.

This wrapper is only provided for backwards compatibility and legacy handling. Its use is strongly discouraged. Use UTF-8 whenever possible.

func (*Encoder) Bytes ¶

func (e *Encoder) Bytes(b []byte) ([]byte, error)

Bytes converts bytes from UTF-8. It returns the converted bytes or nil, err if any error occurred.

func (*Encoder) String ¶

func (e *Encoder) String(s string) (string, error)

String converts a string from UTF-8. It returns the converted string or "", err if any error occurred.

func (*Encoder) Writer ¶

func (e *Encoder) Writer(w io.Writer) io.Writer

Writer wraps another Writer to encode its UTF-8 output.

The Encoder may not be used for any other operation as long as the returned Writer is in use.

type Encoding ¶

type Encoding interface {
	// NewDecoder returns a Decoder.
	NewDecoder() *Decoder

	// NewEncoder returns an Encoder.
	NewEncoder() *Encoder
}

Encoding is a character set encoding that can be transformed to and from UTF-8.

var Nop Encoding = nop{}

Nop is the nop encoding. Its transformed bytes are the same as the source bytes; it does not replace invalid UTF-8 sequences.

var Replacement Encoding = replacement{}

Replacement is the replacement encoding. Decoding from the replacement encoding yields a single '\uFFFD' replacement rune. Encoding from UTF-8 to the replacement encoding yields the same as the source bytes except that invalid UTF-8 is converted to '\uFFFD'.

It is defined at http://encoding.spec.whatwg.org/#replacement

Source Files ¶

View all Source files

encoding.go

Directories ¶

Path	Synopsis
charmap Package charmap provides simple character encodings such as IBM Code Page 437 and Windows 1252.	Package charmap provides simple character encodings such as IBM Code Page 437 and Windows 1252.
htmlindex Package htmlindex maps character set encoding names to Encodings as recommended by the W3C for use in HTML 5.	Package htmlindex maps character set encoding names to Encodings as recommended by the W3C for use in HTML 5.
ianaindex Package ianaindex maps names to Encodings as specified by the IANA registry.	Package ianaindex maps names to Encodings as specified by the IANA registry.
internal Package internal contains code that is shared among encoding implementations.	Package internal contains code that is shared among encoding implementations.
enctest
identifier Package identifier defines the contract between implementations of Encoding and Index by defining identifiers that uniquely identify standardized coded character sets (CCS) and character encoding schemes (CES), which we will together refer to as encodings, for which Encoding implementations provide converters to and from UTF-8.	Package identifier defines the contract between implementations of Encoding and Index by defining identifiers that uniquely identify standardized coded character sets (CCS) and character encoding schemes (CES), which we will together refer to as encodings, for which Encoding implementations provide converters to and from UTF-8.
japanese Package japanese provides Japanese encodings such as EUC-JP and Shift JIS.	Package japanese provides Japanese encodings such as EUC-JP and Shift JIS.
korean Package korean provides Korean encodings such as EUC-KR.	Package korean provides Korean encodings such as EUC-KR.
simplifiedchinese Package simplifiedchinese provides Simplified Chinese encodings such as GBK.	Package simplifiedchinese provides Simplified Chinese encodings such as GBK.
traditionalchinese Package traditionalchinese provides Traditional Chinese encodings such as Big5.	Package traditionalchinese provides Traditional Chinese encodings such as Big5.
unicode Package unicode provides Unicode encodings such as UTF-16.	Package unicode provides Unicode encodings such as UTF-16.
utf32 Package utf32 provides the UTF-32 Unicode encoding.	Package utf32 provides the UTF-32 Unicode encoding.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL