rbnf

package
v2.10.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 17, 2023 License: MIT Imports: 13 Imported by: 0

Documentation

Overview

Package rbnf is a Go implementation of the Unicode Locale Data Markup Language (LDML) [Rule-Based Number Format (RBNF)].

The RBNF can be used for complicated number formatting tasks, such as formatting a number of seconds as hours, minutes and seconds, or spelling out a number like 123 as "one hundred twenty-three", or adding an ordinal suffix to the end of a numeral like "123rd", or formatting numbers in a non-decimal number system such as Roman numerals or traditional Tamil numerals.

This package does not implement any mapping from locale to specific rules. This must be handled at a higher layer.

This package does not store any rules directly. You will have to obtain these from the Unicode Common Locale Data Repository (CLDR), or other sources, or define your own. Some rules CLDR rules for non-decimal number systems are implemented at golib/v2/text/number/algorithmic.

Rule-Based Number Format (RBNF): https://unicode.org/reports/tr35/tr35-numbers.html#6-rule-based-number-formatting golib/v2/text/number/algorithmic: https://github.com/tawesoft/golib/v2/text/number/algorithmic

## Security model

It is assumed that the input rules come from a trusted author (e.g. the CLDR itself, or a trusted provider of localisation rules).

## Note of caution

Quoting from the linked reference:

"Where... CLDR plurals or ordinals can be used, their usage is recommended in preference to the RBNF data. First, the RBNF data is not completely fleshed out over all languages that otherwise have modern coverage. Secondly, the alternate forms are neither complete, nor useful without additional information. For example, for German there is spellout-cardinal-masculine, and spellout-cardinal-feminine. But a complete solution would have all genders (masculine/feminine/neuter), all cases (nominative, accusative, dative, genitive), plus context (with strong or weak determiner or none). Moreover, even for the alternate forms that do exist, CLDR does not supply any data for when to use one vs another (eg, when to use spellout-cardinal-masculine vs spellout-cardinal-feminine). So these data are inappropriate for general purpose software."

Example (Fictional)

Example using custom time factors from the Battlestar Galactica 1978 TV series.

package main

import (
	"fmt"

	"github.com/tawesoft/golib/v2/must"
	"github.com/tawesoft/golib/v2/text/number/rbnf"
)

func main() {
	g := must.Result(rbnf.New(nil, `
        %%s:
            0: s;
            1: ;
            2/1: s;
        %%es:
            0: es;
            1: ;
            2/1: es;
        %%timecomma:
            0: =%time=;
            1: , =%time=;
        %%microns:
            0: =%%spellout-cardinal= microns;
            1: =%%spellout-cardinal= micron;
            2/1: =%%spellout-cardinal= microns;
        %%hyphen-microns:
            0: ' microns;
            1: -=%%spellout-cardinal= micron;
            2/1: -=%%spellout-cardinal= microns;
        %time:
            -x: minus →→;
            0: =%%microns=;
            1: =%%microns=;
            2: =%%microns=;
            20: twenty→%%hyphen-microns→;
            30: thirty→%%hyphen-microns→;
            40: forty→%%hyphen-microns→;
            50: fifty→%%hyphen-microns→;
            60: sixty→%%hyphen-microns→;
            70: seventy→%%hyphen-microns→;
            80: eighty→%%hyphen-microns→;
            90: ninety→%%hyphen-microns→;
            100: ←%%spellout-cardinal← centon←%%s←[→%%timecomma→];
            6000/6000: ←%%spellout-cardinal← centar←%%es←[→%%timecomma→];
            144000/144000: ←%%spellout-cardinal← cycle←%%s←[→%%timecomma→];
            1008000/1008000: ←%%spellout-cardinal← secton←%%s←[→%%timecomma→];
            4032000/4032000: ←%%spellout-cardinal← quatron←%%s←[→%%timecomma→];
            48384000/48384000: ←%%spellout-cardinal-verbose← yahren←%%s←[→%%timecomma→];
        %%spellout-cardinal:
            0: zero;
            1: one;
            2: two;
            3: three;
            4: four;
            5: five;
            6: six;
            7: seven;
            8: eight;
            9: nine;
            10: ten;
            11: eleven;
            12: twelve;
            13: thirteen;
            14: fourteen;
            15: fifteen;
            16: sixteen;
            17: seventeen;
            18: eighteen;
            19: nineteen;
            20: twenty[-→→];
            30: thirty[-→→];
            40: forty[-→→];
            50: fifty[-→→];
            60: sixty[-→→];
            70: seventy[-→→];
            80: eighty[-→→];
            90: ninety[-→→];
            100: ←← hundred[ →→];
            1000: ←← thousand[ →→];
            1000000: ←← million[ →→];
            1000000000: ←← billion[ →→];
            1000000000000: ←← trillion[ →→];
            1000000000000000: ←← quadrillion[ →→];
            1000000000000000000: =#,##0=;
        %%spellout-cardinal-verbose:
            0: =%%spellout-numbering=;
            100: ←← hundred[→%%and→];
            1000: ←← thousand[→%%and→];
            100000/1000: ←← thousand[→%%commas→];
            1000000: ←← million[→%%commas→];
            1000000000: ←← billion[→%%commas→];
            1000000000000: ←← trillion[→%%commas→];
            1000000000000000: ←← quadrillion[→%%commas→];
            1000000000000000000: =#,##0=;
        %%spellout-numbering:
            0: =%%spellout-cardinal=;
        %%and:
            1: ' and =%%spellout-cardinal-verbose=;
            100: ' =%%spellout-cardinal-verbose=;
        %%commas:
            1:' and =%%spellout-cardinal-verbose=;
            100: ' =%%spellout-cardinal-verbose=;
            1000: ' ←%%spellout-cardinal-verbose← thousand[→%%commas→];
            1000000: ' =%%spellout-cardinal-verbose=;
    `))

	type microns int64

	printTime := func(v microns) {
		fmt.Printf("printTime(microns(%d)): %s\n", v,
			must.Result(g.FormatInteger("%time", int64(v))))
	}

	const centon = 100          // in microns, ~= 1 minute.
	const centar = 60 * centon  // ~= 1 hour, plural "centares"
	const cycle = 24 * centar   // ~= 1 day
	const secton = 7 * cycle    // ~= 1 week
	const quatron = 4 * secton  // ~= 1 month
	const yahren = 12 * quatron // ~= 1 year

	printTime(microns(0))
	printTime(microns(1))
	printTime(microns(5))
	printTime(microns(1 * centar))
	printTime(microns(2 * centar))
	printTime(microns((1 * centon) + 95))
	printTime(microns((2 * centar) + (5 * centon) + 1))
	printTime(microns((1 * cycle) + (1 * centar) + 5))
	printTime(microns(1 * secton))
	printTime(microns(1 * quatron))
	printTime(microns((3 * quatron) + (2 * secton)))
	printTime(microns(1 * yahren))
	printTime(microns(2 * yahren))
	printTime(microns(150 * yahren))
	printTime(microns((101 * yahren) + (6 * quatron) + (3 * secton) + (4 * cycle) + (2 * centar) + 50))

}
Output:

printTime(microns(0)): zero microns
printTime(microns(1)): one micron
printTime(microns(5)): five microns
printTime(microns(6000)): one centar
printTime(microns(12000)): two centares
printTime(microns(195)): one centon, ninety-five microns
printTime(microns(12501)): two centares, five centons, one micron
printTime(microns(150005)): one cycle, one centar, five microns
printTime(microns(1008000)): one secton
printTime(microns(4032000)): one quatron
printTime(microns(14112000)): three quatrons, two sectons
printTime(microns(48384000)): one yahren
printTime(microns(96768000)): two yahrens
printTime(microns(7257600000)): one hundred and fifty yahrens
printTime(microns(4914588050)): one hundred and one yahrens, six quatrons, three sectons, four cycles, two centares, fifty microns
Example (SpelloutCardinal)
package main

import (
	"fmt"

	"github.com/tawesoft/golib/v2/must"
	"github.com/tawesoft/golib/v2/text/number/rbnf"
)

func main() {
	g := must.Result(rbnf.New(nil, `
        %spellout-cardinal:
            -x: minus →→;
            x.x: ←← point →→;
            Inf: infinite;
            NaN: not a number;
            0: zero;
            1: one;
            2: two;
            3: three;
            4: four;
            5: five;
            6: six;
            7: seven;
            8: eight;
            9: nine;
            10: ten;
            11: eleven;
            12: twelve;
            13: thirteen;
            14: fourteen;
            15: fifteen;
            16: sixteen;
            17: seventeen;
            18: eighteen;
            19: nineteen;
            20: twenty[-→→];
            30: thirty[-→→];
            40: forty[-→→];
            50: fifty[-→→];
            60: sixty[-→→];
            70: seventy[-→→];
            80: eighty[-→→];
            90: ninety[-→→];
            100: ←← hundred[ →→];
            1000: ←← thousand[ →→];
            1000000: ←← million[ →→];
            1000000000: ←← billion[ →→];
            1000000000000: ←← trillion[ →→];
            1000000000000000: ←← quadrillion[ →→];
            1000000000000000000: =#,##0=;
    `))

	spellout := func(x int64) {
		fmt.Printf("spellout(%d): %s\n", x,
			must.Result(g.FormatInteger("%spellout-cardinal", x)))
	}

	spellout(0)
	spellout(1)
	spellout(2)
	spellout(-5)
	spellout(25)
	spellout(-325)

}
Output:

spellout(0): zero
spellout(1): one
spellout(2): two
spellout(-5): minus five
spellout(25): twenty-five
spellout(-325): minus three hundred twenty-five

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	ErrRange          = errors.New("value out of range")
	ErrNoRule         = errors.New("no rule for this input")
	ErrNotImplemented = errors.New("rule logic not implemented for this input")
	ErrInvalidState   = errors.New("invalid rule state")
)

Errors returned by the Format methods.

Functions

This section is empty.

Types

type Formatter added in v2.8.5

type Formatter struct {
	// contains filtered or unexported fields
}

func (Formatter) FormatInteger added in v2.8.5

func (f Formatter) FormatInteger(v int64) (string, error)

type Group

type Group struct {
	// contains filtered or unexported fields
}

Group defines a group of rule sets. Rule sets may refer to other rule sets in a Group by name, so think of a Group like a lexical scope in a programming language.

func New

func New(p plurals.Rules, rules string) (*Group, error)

New returns a new rule-based number formatter formed from the group of rule sets described by the rules string.

The plurals argument controls formatting of certain plural forms (cardinals and ordinals) used e.g. in spelling out "1st", "2nd", "3rd" or "1 cat", "2 cats", etc. If the ruleset does not contain any rules that use the cardinal syntax ("$(cardinal,plural syntax)$)") or ordinal syntax ("$(ordidinal,plural syntax)$)") then you may simply pass a nil Plural If specified, the methods implemented by the plural argument should usually match the same locale that the ruleset applies to.

The rules string contains one or more rule sets in the format described by the International Components for Unicode (ICU) software implementations ([ICU4C RuleBasedNumberFormat]) and ([ICU4J RuleBasedNumberFormat]), e.g.: "%rulesetName: ruleName: ruleDescriptor; anotherRuleDescriptor: ruleBody;", with some differences:

  • In the ICU implementations, if a formatter only has one rule set, the name may be omitted. In this implementation, the name is always required.
  • In the ICU implementations, a rule descriptor may be left out and have an implicit meaning depending on the previous rule. In this implementation, rule descriptors are always required (in any case, this doesn't appear in the data files, regardless).
  • The ICU API documentation does not specify if a rule set name may appear twice. In this implementation, this is treated as an error.
  • Only the following rule descriptors are supported (those not supported do not seem to appear in the data files, regardless): "bv", "bv/rad", "-x", "x.x", "0.x", "x.0", "Inf", "NaN".
  • For "x.x", "0.x", "x.0" rules, replacing the dot with a comma is not supported (this does not seem to appear in the data files, regardless). Note that this does not mean numbers cannot be *formatted* using commas, only that they can not appear this way in a rule descriptor.

Also note that a rule set is an ordered set.

ICU4C RuleBasedNumberFormat: https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1RuleBasedNumberFormat.html ICU4J RuleBasedNumberFormat: https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/RuleBasedNumberFormat.html

func (*Group) FormatInteger

func (g *Group) FormatInteger(rulesetName string, v int64) (string, error)

func (*Group) Formatter added in v2.8.5

func (g *Group) Formatter(name string) (Formatter, bool)

Formatter returns a Formatter that uses a specific named ruleset from a group to format numbers.

func (*Group) RulesetNames added in v2.8.5

func (g *Group) RulesetNames() []string

RulesetNames returns a slice of the names of the public rulesets in a group, excluding the leading "%".

Directories

Path Synopsis
internal
body
Package body implements parsing of a rbnf rule body.
Package body implements parsing of a rbnf rule body.
descriptor
Package descriptor parses a RNBF rule descriptor.
Package descriptor parses a RNBF rule descriptor.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL