Documentation
¶
Overview ¶
Package charsetx provides functions for detecting charset encoding of an HTML document and UTF-8 conversion.
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DetectCharset ¶
DetectCharset returns charset for contents of urlStr by 3 steps:
- Use the result of charset.DetermineEncoding() if certain is true.
- Use the result of chardet.Detector.DetectBest() if Confidence is 100.
- Use charset in `Content-Type` meta tag if exists.
If no charset is detected by 3 steps, it returns error.
Argument body is response body as byte[], and contentType is `Content-Type` response header value.
Example ¶
package main import ( "fmt" "io/ioutil" "net/http" "github.com/philipjkim/charsetx" ) func main() { client := http.DefaultClient resp, err := client.Get("http://www.godoc.org") if err != nil { fmt.Println(err) return } defer resp.Body.Close() byt, err := ioutil.ReadAll(resp.Body) if err != nil { fmt.Println(err) return } cs, err := charsetx.DetectCharset(byt, resp.Header.Get("Content-Type")) if err != nil { fmt.Println(err) return } fmt.Println(cs) }
Output:
func GetUTF8Body ¶
GetUTF8Body returns string of body. If charset for body is detected as non-UTF8, this function converts to UTF-8 string and return it. Illegal byte sequences are silently discarded if ignoreInvalidUTF8Chars is set to true.
Example ¶
package main import ( "fmt" "io/ioutil" "net/http" "github.com/philipjkim/charsetx" ) func main() { client := http.DefaultClient resp, err := client.Get("https://golang.org/") if err != nil { fmt.Println(err) return } defer resp.Body.Close() body, err := ioutil.ReadAll(resp.Body) if err != nil { fmt.Println(err) return } r, err := charsetx.GetUTF8Body(body, resp.Header.Get("Content-Type"), false) if err != nil { fmt.Println(err) return } fmt.Println(r) }
Output:
func GetUTF8BodyFromURL ¶
GetUTF8BodyFromURL returns response body of urlStr as string. It converts charset to UTF-8 if original charset is non UTF-8. Illegal byte sequences are silently discarded if ignoreIBS is set to true.
Example ¶
package main import ( "fmt" "github.com/philipjkim/charsetx" ) func main() { r, err := charsetx.GetUTF8BodyFromURL("http://www.godoc.org", false) if err != nil { fmt.Println(err) return } fmt.Println(r) }
Output:
Types ¶
This section is empty.