chardet

package
v0.16.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 20, 2024 License: Apache-2.0, MIT Imports: 14 Imported by: 0

Documentation

Overview

Package chardet ports character set detection from ICU.

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrEndOfInputBuffer = errors.New("end of input buffer")
	ErrBadCharDecode    = errors.New("decode a bad char")
)
View Source
var (
	ErrNotDetected = errors.New("charset not detected")
)

Functions

func DecodeFromCharset

func DecodeFromCharset(input []byte, charset string) ([]byte, error)

DecodeFromCharset decode input to utf8

func EncodeToCharset

func EncodeToCharset(input []byte, charset string) ([]byte, error)

EncodeToCharset encode input to charset

func NewReader

func NewReader(r io.Reader, charset string) io.Reader

NewReader: convert text from other encodings to UTF-8

func NewWriter

func NewWriter(w io.Writer, charset string) io.Writer

NewWriter: convert UTF-8 encoding to other encodings

Types

type Detector

type Detector struct {
	// contains filtered or unexported fields
}

Detector implements charset detection.

func NewHtmlDetector

func NewHtmlDetector() *Detector

NewHtmlDetector creates a Detector for Html.

func NewTextDetector

func NewTextDetector() *Detector

NewTextDetector creates a Detector for plain text.

func (*Detector) DetectAll

func (d *Detector) DetectAll(b []byte) ([]Result, error)

DetectAll returns all Results which have non-zero Confidence. The Results are sorted by Confidence in descending order.

func (*Detector) DetectBest

func (d *Detector) DetectBest(b []byte) (r *Result, err error)

DetectBest returns the Result with highest Confidence.

type Result

type Result struct {
	// IANA name of the detected charset.
	Charset string
	// IANA name of the detected language. It may be empty for some charsets.
	Language string
	// Confidence of the Result. Scale from 1 to 100. The bigger, the more confident.
	Confidence int
}

Result contains all the information that charset detector gives.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL