unicode

package

v0.0.0-...-c1c4142 Latest Latest Go to latest Published: Nov 21, 2014 License: BSD-3-Clause Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/weisd/go.txt

Links

Open Source Insights

Documentation ¶

Overview ¶

Package unicode provides Unicode encodings such as UTF-16.

Index ¶

Variables
func UTF16(e Endianness, b BOMPolicy) encoding.Encoding
type BOMPolicy
type Endianness

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrMissingBOM = errors.New("encoding: missing byte order mark")

ErrMissingBOM means that decoding UTF-16 input with ExpectBOM did not find a starting byte order mark.

Functions ¶

func UTF16 ¶

func UTF16(e Endianness, b BOMPolicy) encoding.Encoding

UTF16 returns a UTF-16 Encoding for the given default endianness and byte order mark (BOM) policy.

When decoding from UTF-16 to UTF-8, if the BOMPolicy is IgnoreBOM then neither BOMs U+FEFF nor noncharacters U+FFFE in the input stream will affect the endianness used for decoding, and will instead be output as their standard UTF-8 encodings: "\xef\xbb\xbf" and "\xef\xbf\xbe". If the BOMPolicy is ExpectBOM then the input stream is expected to start with a BOM, and the transformation will return early with an ErrMissingBOM error if it does not. That starting BOM is not written to the UTF-8 output. Instead, it overrides the default endianness e for the remainder of the transformation. Any subsequent BOMs U+FEFF or noncharacters U+FFFE will not affect the endianness used, and will instead be output as their standard UTF-8 encodings.

When encoding from UTF-8 to UTF-16, a BOM will be inserted at the start of the output if the BOMPolicy is ExpectBOM. Otherwise, a BOM will not be inserted. The UTF-8 input does not need to contain a BOM.

There is no concept of a 'native' endianness. If the UTF-16 data is produced and consumed in a greater context that implies a certain endianness, use IgnoreBOM. Otherwise, use ExpectBOM and always produce and consume a BOM.

In the language of http://www.unicode.org/faq/utf_bom.html#bom10, IgnoreBOM corresponds to "Where the precise type of the data stream is known... the BOM should not be used" and ExpectBOM corresponds to "A particular protocol... may require use of the BOM".

Types ¶

type BOMPolicy ¶

type BOMPolicy bool

BOMPolicy is a UTF-16 encoding's byte order mark policy.

const (
	// IgnoreBOM means to ignore any byte order marks.
	IgnoreBOM BOMPolicy = false
	// ExpectBOM means that the UTF-16 form is expected to start with a
	// byte order mark.
	ExpectBOM BOMPolicy = true
)

type Endianness ¶

type Endianness bool

Endianness is a UTF-16 encoding's default endianness.

const (
	// BigEndian is UTF-16BE.
	BigEndian Endianness = false
	// LittleEndian is UTF-16LE.
	LittleEndian Endianness = true
)

Source Files ¶

View all Source files

unicode.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL