Documentation ¶
Overview ¶
gorilla/i18n/linebreak implements the Unicode line breaking algorithm.
Line breaking, also known as word wrapping, is the process of breaking a section of text into lines such that it will fit in the available width of a page, window or other display area.
As simple as it sounds, this is not a trivial task when support for multilingual texts is required. The particular algorithm used in this package is based on best practices defined in UAX #14:
http://www.unicode.org/reports/tr14/
A similar package that served as inspiration for this one is Bram Stein's Unicode Tokenizer (for Node.js):
https://github.com/bramstein/unicode-tokenizer
Index ¶
Constants ¶
const Version = "6.2.0"
Version is the Unicode edition from which the tables are derived.
Variables ¶
var ( AI = _AI // AI is the set of Unicode characters in line breaking class AI. AL = _AL // AL is the set of Unicode characters in line breaking class AL. B2 = _B2 // B2 is the set of Unicode characters in line breaking class B2. BA = _BA // BA is the set of Unicode characters in line breaking class BA. BB = _BB // BB is the set of Unicode characters in line breaking class BB. BK = _BK // BK is the set of Unicode characters in line breaking class BK. CB = _CB // CB is the set of Unicode characters in line breaking class CB. CJ = _CJ // CJ is the set of Unicode characters in line breaking class CJ. CL = _CL // CL is the set of Unicode characters in line breaking class CL. CM = _CM // CM is the set of Unicode characters in line breaking class CM. CP = _CP // CP is the set of Unicode characters in line breaking class CP. CR = _CR // CR is the set of Unicode characters in line breaking class CR. EX = _EX // EX is the set of Unicode characters in line breaking class EX. GL = _GL // GL is the set of Unicode characters in line breaking class GL. H2 = _H2 // H2 is the set of Unicode characters in line breaking class H2. H3 = _H3 // H3 is the set of Unicode characters in line breaking class H3. HL = _HL // HL is the set of Unicode characters in line breaking class HL. HY = _HY // HY is the set of Unicode characters in line breaking class HY. ID = _ID // ID is the set of Unicode characters in line breaking class ID. IN = _IN // IN is the set of Unicode characters in line breaking class IN. IS = _IS // IS is the set of Unicode characters in line breaking class IS. JL = _JL // JL is the set of Unicode characters in line breaking class JL. JT = _JT // JT is the set of Unicode characters in line breaking class JT. JV = _JV // JV is the set of Unicode characters in line breaking class JV. LF = _LF // LF is the set of Unicode characters in line breaking class LF. NL = _NL // NL is the set of Unicode characters in line breaking class NL. NS = _NS // NS is the set of Unicode characters in line breaking class NS. NU = _NU // NU is the set of Unicode characters in line breaking class NU. OP = _OP // OP is the set of Unicode characters in line breaking class OP. PO = _PO // PO is the set of Unicode characters in line breaking class PO. PR = _PR // PR is the set of Unicode characters in line breaking class PR. QU = _QU // QU is the set of Unicode characters in line breaking class QU. RI = _RI // RI is the set of Unicode characters in line breaking class RI. SA = _SA // SA is the set of Unicode characters in line breaking class SA. SG = _SG // SG is the set of Unicode characters in line breaking class SG. SP = _SP // SP is the set of Unicode characters in line breaking class SP. SY = _SY // SY is the set of Unicode characters in line breaking class SY. WJ = _WJ // WJ is the set of Unicode characters in line breaking class WJ. ZW = _ZW // ZW is the set of Unicode characters in line breaking class ZW. )
These variables have type *unicode.RangeTable.
Functions ¶
This section is empty.
Types ¶
type BreakAction ¶
type BreakAction int
const ( // A line break opportunity exists between two adjacent characters of the // given line breaking classes. BreakDirect BreakAction = iota // A line break opportunity exists between two characters of the given // line breaking classes only if they are separated by one or more spaces. BreakIndirect BreakCombiningIndirect BreakCombiningProhibited // No line break opportunity exists between two characters of the given // line breaking classes, even if they are separated by one or more space // characters. BreakProhibited // A line must break following a character that has the mandatory break // property. BreakMandatory )
Line breaking actions.
type BreakClass ¶
type BreakClass int
const ( ClassOP BreakClass = iota // Open Punctuation ClassCL // Close Punctuation ClassCP // Close Parenthesis ClassQU // Quotation ClassGL // Non-breaking ("Glue") ClassNS // Nonstarter ClassEX // Exclamation/Interrogation ClassSY // Symbols Allowing Break After ClassIS // Infix Numeric Separator ClassPR // Prefix Numeric ClassPO // Postfix Numeric ClassNU // Numeric ClassAL // Alphabetic ClassHL // Hebrew Letter ClassID // Ideographic ClassIN // Inseparable ClassHY // Hyphen ClassBA // Break After ClassBB // Break Before ClassB2 // Break Opportunity Before and After ClassZW // Zero Width Space ClassCM // Combining Mark ClassWJ // Word Joiner ClassH2 // Hangul LV Syllable ClassH3 // Hangul LVT Syllable ClassJL // Hangul L Jamo ClassJV // Hangul V Jamo ClassJT // Hangul T Jamo ClassRI // Regional Indicator // Resolved outside of the pair table (> 28). ClassBK // Mandatory Break ClassCR // Carriage Return ClassLF // Line Feed ClassNL // Next Line ClassSG // Surrogate ClassSP // Space ClassCB // Contingent Break Opportunity ClassAI // Ambiguous (Alphabetic or Ideographic) ClassCJ // Conditional Japanese Starter ClassSA // Complex Context Dependent (South East Asian) ClassXX // Unknown )
Line breaking classes.
func Class ¶
func Class(r rune) BreakClass
Class returns the line breaking class for the given rune.
type ClassResolver ¶
type ClassResolver func(rune) BreakClass
ClassResolver returns a line breaking class for the given rune.
type PairTable ¶
type PairTable [][]BreakAction
Pair table stores line breaking actions for adjacent line breaking classes.
PairTable[beforeClass][afterClass] = BreakAction
Note: To determine a break it is generally not sufficient to just test the two adjacent characters. In any case, a custom table allows some degree of result tailoring.
func (PairTable) Action ¶
func (t PairTable) Action(before, after BreakClass) BreakAction
Action returns the line breaking action for the given class pair.
type Scanner ¶
type Scanner struct { Resolver ClassResolver // returns a line breaking class for a rune Table PairTable // returns an action for adjacent line breaking classes // contains filtered or unexported fields }
Scanner scans a text looking for line breaking opportunities.
func NewScanner ¶
NewScanner returns a line breaking scanner to scan the given runes.