chardet

package module
v0.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 28, 2023 License: MIT Imports: 4 Imported by: 3

README

chardet

chardet is library to automatically detect charset of texts for Go programming language. It's based on the algorithm and data in ICU's implementation.

Documentation and Usage

See pkgdoc

Documentation

Overview

Package chardet ports character set detection from ICU.

Index

Constants

This section is empty.

Variables

View Source
var (
	UTF8        = newRecognizer_utf8()
	UTF16BE     = newRecognizer_utf16be()
	UTF16LE     = newRecognizer_utf16le()
	UTF32BE     = newRecognizer_utf32be()
	UTF32LE     = newRecognizer_utf32le()
	ISO88591EN  = newRecognizer_8859_1_en()
	ISO88591DA  = newRecognizer_8859_1_da()
	ISO88591DE  = newRecognizer_8859_1_de()
	ISO88591ES  = newRecognizer_8859_1_es()
	ISO88591FR  = newRecognizer_8859_1_fr()
	ISO88591IT  = newRecognizer_8859_1_it()
	ISO88591NL  = newRecognizer_8859_1_nl()
	ISO88591NO  = newRecognizer_8859_1_no()
	ISO88591PT  = newRecognizer_8859_1_pt()
	ISO88591SV  = newRecognizer_8859_1_sv()
	ISO88592CS  = newRecognizer_8859_2_cs()
	ISO88592HU  = newRecognizer_8859_2_hu()
	ISO88592PL  = newRecognizer_8859_2_pl()
	ISO88592RO  = newRecognizer_8859_2_ro()
	ISO88595RU  = newRecognizer_8859_5_ru()
	ISO88596AR  = newRecognizer_8859_6_ar()
	ISO88597EL  = newRecognizer_8859_7_el()
	ISO88598IHE = newRecognizer_8859_8_I_he()
	ISO88598HE  = newRecognizer_8859_8_he()
	WINDOWS1251 = newRecognizer_windows_1251()
	WINDOWS1256 = newRecognizer_windows_1256()
	KOI8R       = newRecognizer_KOI8_R()
	ISO88599TR  = newRecognizer_8859_9_tr()

	SJIS    = newRecognizer_sjis()
	GB18030 = newRecognizer_gb_18030()
	EUCJP   = newRecognizer_euc_jp()
	EUCKR   = newRecognizer_euc_kr()
	BIG5    = newRecognizer_big5()

	ISO2022JP = newRecognizer_2022JP()
	ISO2022KR = newRecognizer_2022KR()
	ISO2022CN = newRecognizer_2022CN()

	IBM424HE_RTL = newRecognizer_IBM424_he_rtl()
	IBM424HE_LTR = newRecognizer_IBM424_he_ltr()
	IBM420AR_RTL = newRecognizer_IBM420_ar_rtl()
	IBM420AR_LTR = newRecognizer_IBM420_ar_ltr()
)
View Source
var (
	NotDetectedError = errors.New("Charset not detected.")
)

Functions

This section is empty.

Types

type Detector

type Detector struct {
	// contains filtered or unexported fields
}

Detector implements charset detection.

func NewHtmlDetector

func NewHtmlDetector(r ...recognizer) *Detector

NewHtmlDetector creates a Detector for Html.

func NewTextDetector

func NewTextDetector(r ...recognizer) *Detector

NewTextDetector creates a Detector for plain text.

func (*Detector) DetectAll

func (d *Detector) DetectAll(b []byte) ([]Result, error)

DetectAll returns all Results which have non-zero Confidence. The Results are sorted by Confidence in descending order.

func (*Detector) DetectBest

func (d *Detector) DetectBest(b []byte) (r *Result, err error)

DetectBest returns the Result with highest Confidence.

type Result

type Result struct {
	// IANA name of the detected charset.
	Charset string
	// IANA name of the detected language. It may be empty for some charsets.
	Language string
	// Confidence of the Result. Scale from 1 to 100. The bigger, the more confident.
	Confidence int
}

Result contains all the information that charset detector gives.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL