utf8internal

package
v0.10.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 12, 2023 License: BSD-3-Clause Imports: 0 Imported by: 0

Documentation

Overview

Package utf8internal contains low-level utf8-related constants, tables, etc. that are used internally by the text package.

Index

Constants

View Source
const (
	LoCB = 0x80 // 1000 0000
	HiCB = 0xBF // 1011 1111
)

The default lowest and highest continuation byte.

View Source
const (
	// ASCII identifies a UTF-8 byte as ASCII.
	ASCII = as

	// FirstInvalid indicates a byte is invalid as a first byte of a UTF-8
	// sequence.
	FirstInvalid = xx

	// SizeMask is a mask for the size bits. Use use x&SizeMask to get the size.
	SizeMask = 7

	// AcceptShift is the right-shift count for the first byte info byte to get
	// the index into the AcceptRanges table. See AcceptRanges.
	AcceptShift = 4
)

Constants related to getting information of first bytes of UTF-8 sequences.

Variables

View Source
var AcceptRanges = [...]AcceptRange{
	0: {LoCB, HiCB},
	1: {0xA0, HiCB},
	2: {LoCB, 0x9F},
	3: {0x90, HiCB},
	4: {LoCB, 0x8F},
}

AcceptRanges is a slice of AcceptRange values. For a given byte sequence b

AcceptRanges[First[b[0]]>>AcceptShift]

will give the value of AcceptRange for the multi-byte UTF-8 sequence starting at b[0].

View Source
var First = [256]uint8{}/* 256 elements not displayed */

First is information about the first byte in a UTF-8 sequence.

Functions

This section is empty.

Types

type AcceptRange

type AcceptRange struct {
	Lo uint8 // lowest value for second byte.
	Hi uint8 // highest value for second byte.
}

AcceptRange gives the range of valid values for the second byte in a UTF-8 sequence for any value for First that is not ASCII or FirstInvalid.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL