bom

package
v0.19.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 21, 2023 License: GPL-3.0 Imports: 6 Imported by: 0

README

BOM

Byte Order Mark

Functions and methods to work with BOM.
BOM is a byte order mark.

The byte order mark (BOM) is a particular usage of the special Unicode character, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text:

  • The byte order, or endianness, of the text stream in the cases of 16-bit and 32-bit encodings;
  • The fact that the text stream's encoding is Unicode, to a high level of confidence;
  • Which Unicode character encoding is used.

More information about the BOM can be found in Wikipedia:
https://en.wikipedia.org/wiki/Byte_order_mark

Documentation

Index

Constants

View Source
const (
	ErrUnknownEncoding = "unknown encoding: %v"
	ErrBOMIsNotFound   = "byte order mark is not found"
	ErrDuplicateProbe  = "duplicate probe for encoding %v"
)
View Source
const (
	EncodingUTF8       = Encoding(1)  // UTF-8 Encoding.
	EncodingUTF16BE    = Encoding(2)  // UTF-16 (BE, Big Endian) Encoding.
	EncodingUTF16LE    = Encoding(3)  // UTF-16 (LE, Little Endian) Encoding.
	EncodingUTF32BE    = Encoding(4)  // UTF-32 (BE, Big Endian) Encoding.
	EncodingUTF32LE    = Encoding(5)  // UTF-32 (LE, Little Endian) Encoding.
	EncodingUTF7       = Encoding(6)  // UTF-7 Encoding.
	EncodingUTF1       = Encoding(8)  // UTF-1 Encoding.
	EncodingUTF_EBCDIC = Encoding(9)  // UTF-EBCDIC Encoding.
	EncodingSCSU       = Encoding(10) // SCSU Encoding.
	EncodingBOCU1      = Encoding(11) // BOCU-1 Encoding.
	EncodingGB18030    = Encoding(12) // GB18030 Encoding.
)
View Source
const (
	ErrArraysHaveDifferentLengths = "arrays have different lengths: %v vs %v"
	ErrNoData                     = "no data"
)

Variables

This section is empty.

Functions

func BOMBOCU1 added in v0.9.0

func BOMBOCU1() []byte

func BOMGB18030 added in v0.9.0

func BOMGB18030() []byte

func BOMSCSU added in v0.9.0

func BOMSCSU() []byte

func BOMUTF1 added in v0.9.0

func BOMUTF1() []byte

func BOMUTF16BE added in v0.9.0

func BOMUTF16BE() []byte

func BOMUTF16LE added in v0.9.0

func BOMUTF16LE() []byte

func BOMUTF32BE added in v0.9.0

func BOMUTF32BE() []byte

func BOMUTF32LE added in v0.9.0

func BOMUTF32LE() []byte

func BOMUTF7 added in v0.9.0

func BOMUTF7() []byte

func BOMUTF8 added in v0.9.0

func BOMUTF8() []byte

func BOMUTF_EBCDIC added in v0.9.0

func BOMUTF_EBCDIC() []byte

func BOMs added in v0.9.0

func BOMs() map[Encoding][]byte

BOMs returns a map of BOMs.

func ReadBOMOfEncoding added in v0.9.0

func ReadBOMOfEncoding(r io.Reader, enc Encoding) (prefix []byte, err error)

ReadBOMOfEncoding tries to read the BOM of a specified encoding. The prefix which was read from the stream is always returned.

func SkipBOM added in v0.9.0

func SkipBOM(r io.Reader, enc Encoding) (err error)

SkipBOM tries to skip a BOM prefix of the specified encoding in the stream. It reads the BOM, shifting the reader's "cursor".

Types

type Encoding

type Encoding byte

Encoding is an encoding type. Usually it is a text encoding using Unicode symbols. Unicode on Wikipedia: https://en.wikipedia.org/wiki/Unicode

func PossibleEncodings

func PossibleEncodings() []Encoding

PossibleEncodings returns a list of possible encodings except the unknown encoding.

func SearchForBOM added in v0.9.0

func SearchForBOM(r io.Reader) (encodings []Encoding, acc []byte, err error)

SearchForBOM searches the stream for BOM. acc = array (slice) of bytes read from the stream to detect the encoding.

type Probe added in v0.9.0

type Probe struct {
	// Encoding is the specified encoding which is searched in the probes.
	Encoding Encoding

	// Probability is the probability of the encoding to be used in the probes.
	Probability tsb.TSB

	// ReadBytesCount is the number of bytes which were read to get the probe.
	ReadBytesCount int
}

Probe stores the result of probing the text for a specified encoding. In other words, it stores the probability of the probes to be of the specified encoding.

func ProbeForEncoding added in v0.9.0

func ProbeForEncoding(data []byte, enc Encoding) (probe *Probe, err error)

ProbeForEncoding tries to search the probes for the specified encoding.

func (*Probe) IsAccurate added in v0.9.0

func (p *Probe) IsAccurate() bool

IsAccurate tells whether the probe results are accurate or not. Here by accuracy we mean the exact 'yes' or 'no' probability.

type Report added in v0.9.0

type Report struct {
	// contains filtered or unexported fields
}

Report stores the result of making probes for all possible encodings.

func GetEncodingsReport added in v0.9.0

func GetEncodingsReport(data []byte, encodingsToProbe map[Encoding]bool) (report *Report, err error)

GetEncodingsReport tries to get the report about the specified probes. Please note that some encodings have similar BOMs and this fact can make probe results inaccurate.

func (*Report) GetAccurateProbes added in v0.9.0

func (r *Report) GetAccurateProbes() (accurateProbes []*Probe)

GetAccurateProbes returns accurate probes of the report.

func (*Report) IsAccurate added in v0.9.0

func (r *Report) IsAccurate() bool

IsAccurate tells whether all the probes of the report are accurate or not.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL