utf8reader

package module
v0.5.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 30, 2024 License: MIT Imports: 6 Imported by: 1

README

utf8reader

A simple go package that converts an io.Reader to a utf8 encoded io.Reader. It automatically detects the encoding of the input and converts it to utf8.

Usage


package main

import (
    "fmt"
    "bytes"

    "github.com/kpym/utf8reader"
)

func main() {
    // Create a reader with koi8-r encoded "Това е на български"
    r := bytes.NewReader([]byte{0xF4, 0xCF, 0xD7, 0xC1, 0x20, 0xC5, 0x20, 0xCE, 0xC1, 0x20, 0xC2, 0xDF, 0xCC, 0xC7, 0xC1, 0xD2, 0xD3, 0xCB, 0xC9})
    reader := utf8reader.New(r)

    // Read the content of the reader
    buf := make([]byte, 100)
    n, err := reader.Read(buf)
    if err != nil {
        fmt.Println(err)
    }
    fmt.Println(string(buf[:n]))
    // Output: Това е на български
}

Documentation

Go Reference

You can find the documentation on pkg.go.dev.

License

MIT

Documentation

Overview

Package utf8reader provides a utility to wrap an io.Reader that contains text in an arbitrary encoding and produce an io.Reader that outputs UTF-8 encoded text. The package automatically detects the original encoding and converts the input to UTF-8. Additionally, it can normalize the text to a specified Unicode normalization form (NFC or NFD).

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func WithNormalization added in v0.5.1

func WithNormalization(nor string) option

WithNormalization sets the normalization form. The normalization form can be "NFC" or "NFD". By default no normalization is done. WithNormalization("NFC") is equivalent to WithTransformers(norm.NFC). WithNormalization("NFD") is equivalent to WithTransformers(norm.NFD).

func WithPeekSize added in v0.3.0

func WithPeekSize(size int) option

WithPeekSize sets the number of bytes to peak. By default it peaks 4096 bytes. The peaked bytes are used to detect the encoding.

func WithTransform added in v0.5.1

func WithTransform(transformers ...transform.Transformer) option

WithTransformers append a (set of) transformer(s).

Types

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader wraps an io.Reader to convert its input to UTF-8 encoding, if required.

func New

func New(r io.Reader, options ...option) *Reader

New creates a Reader that converts the input to UTF-8. If encoding detection fails the input stays unchanged, and Encoding() will return an empty string.

func (*Reader) Encoding

func (r *Reader) Encoding() string

Encoding returns the encoding detected from the input, or an empty string if detection was unsuccessful, or an error occurred during the detection.

func (*Reader) Peek added in v0.3.0

func (r *Reader) Peek() ([]byte, error)

Peek returns a UTF-8 encoded snapshot of the first bytes of the reader, primarily for encoding detection. The size of the snapshot is at most the size of the peek buffer, set by the PeekSize option. This method must be called before any Read operations.

func (*Reader) Read

func (r *Reader) Read(p []byte) (n int, err error)

Read reads data from the underlying reader, ensuring it is UTF-8 encoded. It returns the number of bytes read into p and any error encountered. If the Reader is nil, it returns 0 and io.EOF.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL