utfbom

package

v0.27.1-rc1 Latest Latest Go to latest Published: Jul 12, 2024 License: Apache-2.0, Apache-2.0 Imports: 2 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/smallstep/cli

README ¶

utfbom

The package utfbom implements the detection of the BOM (Unicode Byte Order Mark) and removing as necessary. It can also return the encoding detected by the BOM.

Installation

go get -u github.com/dimchansky/utfbom

Example

package main

import (
	"bytes"
	"fmt"
	"io/ioutil"

	"github.com/dimchansky/utfbom"
)

func main() {
	trySkip([]byte("\xEF\xBB\xBFhello"))
	trySkip([]byte("hello"))
}

func trySkip(byteData []byte) {
	fmt.Println("Input:", byteData)

	// just skip BOM
	output, err := ioutil.ReadAll(utfbom.SkipOnly(bytes.NewReader(byteData)))
	if err != nil {
		fmt.Println(err)
		return
	}
	fmt.Println("ReadAll with BOM skipping", output)

	// skip BOM and detect encoding
	sr, enc := utfbom.Skip(bytes.NewReader(byteData))
	fmt.Printf("Detected encoding: %s\n", enc)
	output, err = ioutil.ReadAll(sr)
	if err != nil {
		fmt.Println(err)
		return
	}
	fmt.Println("ReadAll with BOM detection and skipping", output)
	fmt.Println()
}

Output:

$ go run main.go
Input: [239 187 191 104 101 108 108 111]
ReadAll with BOM skipping [104 101 108 108 111]
Detected encoding: UTF8
ReadAll with BOM detection and skipping [104 101 108 108 111]

Input: [104 101 108 108 111]
ReadAll with BOM skipping [104 101 108 108 111]
Detected encoding: Unknown
ReadAll with BOM detection and skipping [104 101 108 108 111]

Documentation ¶

Overview ¶

Package utfbom implements the detection of the BOM (Unicode Byte Order Mark) and removing as necessary. It wraps an io.Reader object, creating another object (Reader) that also implements the io.Reader interface but provides automatic BOM checking and removing as necessary.

This package was copied from https://github.com/dimchansky/utfbom. Only minor changes were made to not depend on the io/ioutil package and to make our linters pass.

Index ¶

func Skip(rd io.Reader) (*Reader, Encoding)
type Encoding
- func (e Encoding) String() string
type Reader
- func SkipOnly(rd io.Reader) *Reader
- func (r *Reader) Read(p []byte) (n int, err error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Skip ¶

func Skip(rd io.Reader) (*Reader, Encoding)

Skip creates Reader which automatically detects BOM (Unicode Byte Order Mark) and removes it as necessary. It also returns the encoding detected by the BOM. If the detected encoding is not needed, you can call the SkipOnly function.

Types ¶

type Encoding ¶

type Encoding int

Encoding is type alias for detected UTF encoding.

const (
	// Unknown encoding, returned when no BOM was detected
	Unknown Encoding = iota

	// UTF8, BOM bytes: EF BB BF
	UTF8

	// UTF-16, big-endian, BOM bytes: FE FF
	UTF16BigEndian

	// UTF-16, little-endian, BOM bytes: FF FE
	UTF16LittleEndian

	// UTF-32, big-endian, BOM bytes: 00 00 FE FF
	UTF32BigEndian

	// UTF-32, little-endian, BOM bytes: FF FE 00 00
	UTF32LittleEndian
)

Constants to identify detected UTF encodings.

func (Encoding) String ¶

func (e Encoding) String() string

String returns a user-friendly string representation of the encoding. Satisfies fmt.Stringer interface.

type Reader ¶

type Reader struct {
	// contains filtered or unexported fields
}

Reader implements automatic BOM (Unicode Byte Order Mark) checking and removing as necessary for an io.Reader object.

func SkipOnly ¶

func SkipOnly(rd io.Reader) *Reader

SkipOnly creates Reader which automatically detects BOM (Unicode Byte Order Mark) and removes it as necessary.

func (*Reader) Read ¶

func (r *Reader) Read(p []byte) (n int, err error)

Read is an implementation of io.Reader interface. The bytes are taken from the underlying Reader, but it checks for BOMs, removing them as necessary.

Source Files ¶

View all Source files

utfbom.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL