README
¶
utfbom

The package utfbom implements the detection of the BOM (Unicode Byte Order Mark) and removing as necessary. It can also return the encoding detected by the BOM.
Installation
go get -u github.com/dimchansky/utfbom
Example
package main
import (
"bytes"
"fmt"
"io/ioutil"
"github.com/dimchansky/utfbom"
)
func main() {
trySkip([]byte("\xEF\xBB\xBFhello"))
trySkip([]byte("hello"))
}
func trySkip(byteData []byte) {
fmt.Println("Input:", byteData)
// just skip BOM
output, err := ioutil.ReadAll(utfbom.SkipOnly(bytes.NewReader(byteData)))
if err != nil {
fmt.Println(err)
return
}
fmt.Println("ReadAll with BOM skipping", output)
// skip BOM and detect encoding
sr, enc := utfbom.Skip(bytes.NewReader(byteData))
fmt.Printf("Detected encoding: %s\n", enc)
output, err = ioutil.ReadAll(sr)
if err != nil {
fmt.Println(err)
return
}
fmt.Println("ReadAll with BOM detection and skipping", output)
fmt.Println()
}
Output:
$ go run main.go
Input: [239 187 191 104 101 108 108 111]
ReadAll with BOM skipping [104 101 108 108 111]
Detected encoding: UTF8
ReadAll with BOM detection and skipping [104 101 108 108 111]
Input: [104 101 108 108 111]
ReadAll with BOM skipping [104 101 108 108 111]
Detected encoding: Unknown
ReadAll with BOM detection and skipping [104 101 108 108 111]
Documentation
¶
Overview ¶
Package utfbom implements the detection of the BOM (Unicode Byte Order Mark) and removing as necessary. It wraps an io.Reader object, creating another object (Reader) that also implements the io.Reader interface but provides automatic BOM checking and removing as necessary.
This package was copied from https://github.com/dimchansky/utfbom. Only minor changes were made to not depend on the io/ioutil package and to make our linters pass.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Encoding ¶
type Encoding int
Encoding is type alias for detected UTF encoding.
const ( // Unknown encoding, returned when no BOM was detected Unknown Encoding = iota // UTF8, BOM bytes: EF BB BF UTF8 // UTF-16, big-endian, BOM bytes: FE FF UTF16BigEndian // UTF-16, little-endian, BOM bytes: FF FE UTF16LittleEndian // UTF-32, big-endian, BOM bytes: 00 00 FE FF UTF32BigEndian // UTF-32, little-endian, BOM bytes: FF FE 00 00 UTF32LittleEndian )
Constants to identify detected UTF encodings.
type Reader ¶
type Reader struct {
// contains filtered or unexported fields
}
Reader implements automatic BOM (Unicode Byte Order Mark) checking and removing as necessary for an io.Reader object.