Documentation ¶
Index ¶
- Constants
- func ConvertSymbols(s string) string
- func Download() (io.ReadCloser, error)
- func FixSymbolSpaces(s string) string
- func IsHanzi(s string) bool
- func PinyinPlaintext(s string) string
- func PinyinToneNums(s string) string
- func PinyinTones(s string) string
- func StripDigits(s string) string
- func StripTones(s string) string
- type Dict
- func (d *Dict) DefaultFilename() string
- func (d *Dict) Err() error
- func (d *Dict) GetAllByHanzi(s string) []*Entry
- func (d *Dict) GetByHanzi(s string) *Entry
- func (d *Dict) GetByMeaning(s string) []*Entry
- func (d *Dict) GetByPinyin(s string) []*Entry
- func (d *Dict) HanziToPinyin(s string) string
- func (d *Dict) Metadata() Metadata
- func (d *Dict) Save(filename string) error
- type Entry
- type Metadata
Examples ¶
Constants ¶
const ( // URL of the latest CC-CEDICT data in .tar.gz archive format. URL = "https://www.mdbg.net/chinese/export/cedict/cedict_1_0_ts_utf-8_mdbg.txt.gz" // LineEnding used by Save(), defaults to "\r\n" to match original content. LineEnding = "\r\n" // MaxResults determines the most entries returned for any Dict method. MaxResults = 50 // MaxLD controls the max levenshtein distance allowed for matches. MaxLD = 10 )
Variables ¶
This section is empty.
Functions ¶
func ConvertSymbols ¶
ConvertSymbols replaces common hanzi symbols with latin symbols.
func Download ¶
func Download() (io.ReadCloser, error)
Download returns a Dict using the latest CC-CEDICT archive from MDBG. This file is regularly updated but relatively small at approx 4MB.
func FixSymbolSpaces ¶
FixSymbolSpaces removes spaces added by HanziToPinyin conversion and makes the string look more natural.
func IsHanzi ¶
IsHanzi returns true if the string contains only han characters. http://www.unicode.org/reports/tr38/tr38-27.html HAN Unification
func PinyinPlaintext ¶
PinyinPlaintext returns pinyin string without tones or tone numbers.
func PinyinToneNums ¶
PinyinToneNums returns pinyin string converting tones to tone numbers.
func PinyinTones ¶
PinyinTones returns pinyin string converting tone numbers to tones. It supports both CC-CEDICT format, with tones at the end of syllables i.e. Zhong1 wen2, as well as inline format with tones after their respective character i.e. Zho1ng we2n.
func StripDigits ¶
StripDigits returns the string with all unicode digits removed.
func StripTones ¶
StripTones returns the string with all (mark, nonspacing) removed.
Types ¶
type Dict ¶
type Dict struct {
// contains filtered or unexported fields
}
Dict represents an instance of the CC-CEDICT entries. By default, the latest version will be downloaded on creation.
Example (GetByPinyin) ¶
d := New() elements := d.GetByPinyin("mei guo ren") for _, e := range elements { fmt.Printf("%s - %s\n", e.Traditional, FixSymbolSpaces(PinyinTones(e.Pinyin))) }
Output: 美國人 - Měi guó rén
Example (HanziToPinyin) ¶
d := New() hans := "你喜歡學中文嗎?" fmt.Printf("%s (plaintext) '%s'\n", hans, PinyinPlaintext(d.HanziToPinyin(hans))) fmt.Printf("%s (tonenums) '%s'\n", hans, d.HanziToPinyin(hans)) fmt.Printf("%s (tones) '%s'\n", hans, FixSymbolSpaces(PinyinTones(d.HanziToPinyin(hans))))
Output: 你喜歡學中文嗎? (plaintext) 'Ni xi huan xue zhong wen ma ?' 你喜歡學中文嗎? (tonenums) 'Ni3 xi3 huan5 xue2 zhong1 wen2 ma2 ?' 你喜歡學中文嗎? (tones) 'Nǐ xǐ huan xué zhōng wén má?'
func Load ¶
Load returns a Dict loaded from a CC-CEDICT formatted file. This is provided for completeness, but I encourage you to use default behaviour of downloading the latest dict each time.
func New ¶
func New() *Dict
New returns a Dict immediately but downloads the latest CC-CEDICT data in the background. Dict methods can be safely called, but will block until parsing is complete.
func Parse ¶
Parse creates a Dict instance from an io.Reader It expects text input in the format, https://cc-cedict.org/wiki/format:syntax
func (*Dict) DefaultFilename ¶
DefaultFilename returns the CC-CEDICT filename format. constructed using the Dict's parsed metadata.
func (*Dict) Err ¶
Err blocks until the Dict is finished parsing and then returns any errors encountered during loading/download.
func (*Dict) GetAllByHanzi ¶
GetAllByHanzi returns the Dict entries for the hanzi, if found. Supports input using traditional or simplified characters.
func (*Dict) GetByHanzi ¶
GetByHanzi returns the Dict entry for the hanzi, if found. Supports input using traditional or simplified characters.
func (*Dict) GetByMeaning ¶
GetByMeaning returns entries containing the specified meaning. Matching is not case-sensitive and can be exact/non-exact.
func (*Dict) GetByPinyin ¶
GetByPinyin returns hanzi matching the given pinyin string. Supports pinyin in plaintext or with tones/tone numbers. With plaintext, all tone variations are considered matching.
func (*Dict) HanziToPinyin ¶
HanziToPinyin converts hanzi to their pinyin representation. It implements greedy matching for longest character combos.
type Entry ¶
Entry represents a single entry in the CC-CEDICT dictionary.
func (*Entry) Marshal ¶
Marshal returns the entry, formatted according to https://cc-cedict.org/wiki/format:syntax