linebreak

package
v0.0.0-...-8b35816 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 20, 2015 License: BSD-3-Clause Imports: 2 Imported by: 15

Documentation

Overview

gorilla/i18n/linebreak implements the Unicode line breaking algorithm.

Line breaking, also known as word wrapping, is the process of breaking a section of text into lines such that it will fit in the available width of a page, window or other display area.

As simple as it sounds, this is not a trivial task when support for multilingual texts is required. The particular algorithm used in this package is based on best practices defined in UAX #14:

http://www.unicode.org/reports/tr14/

A similar package that served as inspiration for this one is Bram Stein's Unicode Tokenizer (for Node.js):

https://github.com/bramstein/unicode-tokenizer

Index

Constants

View Source
const Version = "6.2.0"

Version is the Unicode edition from which the tables are derived.

Variables

View Source
var (
	AI = _AI // AI is the set of Unicode characters in line breaking class AI.
	AL = _AL // AL is the set of Unicode characters in line breaking class AL.
	B2 = _B2 // B2 is the set of Unicode characters in line breaking class B2.
	BA = _BA // BA is the set of Unicode characters in line breaking class BA.
	BB = _BB // BB is the set of Unicode characters in line breaking class BB.
	BK = _BK // BK is the set of Unicode characters in line breaking class BK.
	CB = _CB // CB is the set of Unicode characters in line breaking class CB.
	CJ = _CJ // CJ is the set of Unicode characters in line breaking class CJ.
	CL = _CL // CL is the set of Unicode characters in line breaking class CL.
	CM = _CM // CM is the set of Unicode characters in line breaking class CM.
	CP = _CP // CP is the set of Unicode characters in line breaking class CP.
	CR = _CR // CR is the set of Unicode characters in line breaking class CR.
	EX = _EX // EX is the set of Unicode characters in line breaking class EX.
	GL = _GL // GL is the set of Unicode characters in line breaking class GL.
	H2 = _H2 // H2 is the set of Unicode characters in line breaking class H2.
	H3 = _H3 // H3 is the set of Unicode characters in line breaking class H3.
	HL = _HL // HL is the set of Unicode characters in line breaking class HL.
	HY = _HY // HY is the set of Unicode characters in line breaking class HY.
	ID = _ID // ID is the set of Unicode characters in line breaking class ID.
	IN = _IN // IN is the set of Unicode characters in line breaking class IN.
	IS = _IS // IS is the set of Unicode characters in line breaking class IS.
	JL = _JL // JL is the set of Unicode characters in line breaking class JL.
	JT = _JT // JT is the set of Unicode characters in line breaking class JT.
	JV = _JV // JV is the set of Unicode characters in line breaking class JV.
	LF = _LF // LF is the set of Unicode characters in line breaking class LF.
	NL = _NL // NL is the set of Unicode characters in line breaking class NL.
	NS = _NS // NS is the set of Unicode characters in line breaking class NS.
	NU = _NU // NU is the set of Unicode characters in line breaking class NU.
	OP = _OP // OP is the set of Unicode characters in line breaking class OP.
	PO = _PO // PO is the set of Unicode characters in line breaking class PO.
	PR = _PR // PR is the set of Unicode characters in line breaking class PR.
	QU = _QU // QU is the set of Unicode characters in line breaking class QU.
	RI = _RI // RI is the set of Unicode characters in line breaking class RI.
	SA = _SA // SA is the set of Unicode characters in line breaking class SA.
	SG = _SG // SG is the set of Unicode characters in line breaking class SG.
	SP = _SP // SP is the set of Unicode characters in line breaking class SP.
	SY = _SY // SY is the set of Unicode characters in line breaking class SY.
	WJ = _WJ // WJ is the set of Unicode characters in line breaking class WJ.
	ZW = _ZW // ZW is the set of Unicode characters in line breaking class ZW.
)

These variables have type *unicode.RangeTable.

Functions

This section is empty.

Types

type BreakAction

type BreakAction int
const (
	// A line break opportunity exists between two adjacent characters of the
	// given line breaking classes.
	BreakDirect BreakAction = iota
	// A line break opportunity exists between two characters of the given
	// line breaking classes only if they are separated by one or more spaces.
	BreakIndirect
	BreakCombiningIndirect
	BreakCombiningProhibited
	// No line break opportunity exists between two characters of the given
	// line breaking classes, even if they are separated by one or more space
	// characters.
	BreakProhibited
	// A line must break following a character that has the mandatory break
	// property.
	BreakMandatory
)

Line breaking actions.

type BreakClass

type BreakClass int
const (
	ClassOP BreakClass = iota // Open Punctuation
	ClassCL                   // Close Punctuation
	ClassCP                   // Close Parenthesis
	ClassQU                   // Quotation
	ClassGL                   // Non-breaking ("Glue")
	ClassNS                   // Nonstarter
	ClassEX                   // Exclamation/Interrogation
	ClassSY                   // Symbols Allowing Break After
	ClassIS                   // Infix Numeric Separator
	ClassPR                   // Prefix Numeric
	ClassPO                   // Postfix Numeric
	ClassNU                   // Numeric
	ClassAL                   // Alphabetic
	ClassHL                   // Hebrew Letter
	ClassID                   // Ideographic
	ClassIN                   // Inseparable
	ClassHY                   // Hyphen
	ClassBA                   // Break After
	ClassBB                   // Break Before
	ClassB2                   // Break Opportunity Before and After
	ClassZW                   // Zero Width Space
	ClassCM                   // Combining Mark
	ClassWJ                   // Word Joiner
	ClassH2                   // Hangul LV Syllable
	ClassH3                   // Hangul LVT Syllable
	ClassJL                   // Hangul L Jamo
	ClassJV                   // Hangul V Jamo
	ClassJT                   // Hangul T Jamo
	ClassRI                   // Regional Indicator
	// Resolved outside of the pair table (> 28).
	ClassBK // Mandatory Break
	ClassCR // Carriage Return
	ClassLF // Line Feed
	ClassNL // Next Line
	ClassSG // Surrogate
	ClassSP // Space
	ClassCB // Contingent Break Opportunity
	ClassAI // Ambiguous (Alphabetic or Ideographic)
	ClassCJ // Conditional Japanese Starter
	ClassSA // Complex Context Dependent (South East Asian)
	ClassXX // Unknown
)

Line breaking classes.

See: http://www.unicode.org/reports/tr14/#Table1

func Class

func Class(r rune) BreakClass

Class returns the line breaking class for the given rune.

type ClassResolver

type ClassResolver func(rune) BreakClass

ClassResolver returns a line breaking class for the given rune.

type PairTable

type PairTable [][]BreakAction

Pair table stores line breaking actions for adjacent line breaking classes.

PairTable[beforeClass][afterClass] = BreakAction

Note: To determine a break it is generally not sufficient to just test the two adjacent characters. In any case, a custom table allows some degree of result tailoring.

func (PairTable) Action

func (t PairTable) Action(before, after BreakClass) BreakAction

Action returns the line breaking action for the given class pair.

type Scanner

type Scanner struct {
	Resolver ClassResolver // returns a line breaking class for a rune
	Table    PairTable     // returns an action for adjacent line breaking classes
	// contains filtered or unexported fields
}

Scanner scans a text looking for line breaking opportunities.

func NewScanner

func NewScanner(r []rune) *Scanner

NewScanner returns a line breaking scanner to scan the given runes.

func (*Scanner) Next

func (s *Scanner) Next() (pos int, action BreakAction, err error)

Next finds the next line breaking action in the input.

It can be called successively to find all actions until the end of the input, when it returns io.EOF as error.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL