Documentation
¶
Overview ¶
Package encoding defines an interface for character encodings, such as Shift JIS and Windows 1252, that can convert to and from UTF-8.
To convert the bytes of an io.Reader r from the encoding e to UTF-8:
rInUTF8 := transform.NewReader(r, e.NewDecoder())
and to convert from UTF-8 to the encoding e:
wInUTF8 := transform.NewWriter(w, e.NewEncoder())
In both cases, import "golang.org/x/text/transform".
Encoding implementations are provided in other packages, such as golang.org/x/text/encoding/charmap and golang.org/x/text/encoding/japanese.
Index ¶
Constants ¶
const ASCIISub = '\x1a'
ASCIISub is the ASCII substitute character, as recommended by http://unicode.org/reports/tr36/#Text_Comparison
Variables ¶
var ErrInvalidUTF8 = errors.New("encoding: invalid UTF-8")
ErrInvalidUTF8 means that a transformer encountered invalid UTF-8.
var UTF8Validator transform.Transformer = utf8Validator{}
UTF8Validator is a transformer that returns ErrInvalidUTF8 on the first input byte that is not valid UTF-8.
Functions ¶
This section is empty.
Types ¶
type Encoding ¶
type Encoding interface { // NewDecoder returns a transformer that converts to UTF-8. // // Transforming source bytes that are not of that encoding will not // result in an error per se. Each byte that cannot be transcoded will // be represented in the output by the UTF-8 encoding of '\uFFFD', the // replacement rune. NewDecoder() transform.Transformer // NewEncoder returns a transformer that converts from UTF-8. // // Transforming source bytes that are not valid UTF-8 will not result in // an error per se. Each rune that cannot be transcoded will be // represented in the output by an encoding-specific replacement such as // "\x1a" (the ASCII substitute character) or "\xff\xfd". To return // early with error instead, use transform.Chain to preprocess the data // with a UTF8Validator. NewEncoder() transform.Transformer }
Encoding is a character set encoding that can be transformed to and from UTF-8.
var Nop Encoding = nop{}
Nop is the nop encoding. Its transformed bytes are the same as the source bytes; it does not replace invalid UTF-8 sequences.
var Replacement Encoding = replacement{}
Replacement is the replacement encoding. Decoding from the replacement encoding yields a single '\uFFFD' replacement rune. Encoding from UTF-8 to the replacement encoding yields the same as the source bytes except that invalid UTF-8 is converted to '\uFFFD'.
It is defined at http://encoding.spec.whatwg.org/#replacement
Directories
¶
Path | Synopsis |
---|---|
Package charmap provides simple character encodings such as IBM Code Page 437 and Windows 1252.
|
Package charmap provides simple character encodings such as IBM Code Page 437 and Windows 1252. |
Package japanese provides Japanese encodings such as EUC-JP and Shift JIS.
|
Package japanese provides Japanese encodings such as EUC-JP and Shift JIS. |
Package korean provides Korean encodings such as EUC-KR.
|
Package korean provides Korean encodings such as EUC-KR. |
Package simplifiedchinese provides Simplified Chinese encodings such as GBK.
|
Package simplifiedchinese provides Simplified Chinese encodings such as GBK. |
Package traditionalchinese provides Traditional Chinese encodings such as Big5.
|
Package traditionalchinese provides Traditional Chinese encodings such as Big5. |
Package unicode provides Unicode encodings such as UTF-16.
|
Package unicode provides Unicode encodings such as UTF-16. |