opencc

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 31, 2021 License: Apache-2.0 Imports: 13 Imported by: 2

README

Golang OpenCC

Golang 简体繁体中文互转

Go Report Card GoDoc GitHub release

Introduction 介紹

gocc is a golang port of OpenCC(Open Chinese Convert 開放中文轉換) which is a project for conversion between Traditional and Simplified Chinese developed by BYVoid.

gocc stands for "Golang version OpenCC", it is a total rewrite version of OpenCC in Go. It just borrows the dict files and config files of OpenCC, so it may not produce the same output with the original OpenCC.

参考以下两个仓库源进行优化,并使用Go Module进行管理, 方便新项目引用, 同时进行完善更新

gocc OpenCC-go

Installation 安裝

1. Golang Package
go get github.com/teamlint/opencc
package main

import (
    "fmt"
    "log"
    
    "github.com/teamlint/opencc"
)

func main() {
    // 简体转繁体
    s2t, err := openc.New("s2t")
    if err != nil {
        log.Fatal(err)
    }
    in := `自然语言处理是人工智能领域中的一个重要方向。`
    out, err := s2t.Convert(in)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("%s\n%s\n", in, out)
    //自然语言处理是人工智能领域中的一个重要方向。
    //自然語言處理是人工智能領域中的一個重要方向。

    // 繁体转简体
    t2s, err := ccgo.New("t2s")
    if err != nil {
        log.Fatal(err)
    }
    in := "閱坊-閱讀的樂趣"
    out, err := t2s.Convert(str)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("%s\n%s\n", in, out)
    //閱坊-閱讀的樂趣
    //阅坊-阅读的乐趣
}
2. Command Line
git clone https://github.com/teamlint/opencc
cd opencc/cmd/gocc
make install
gocc --help
echo "阅坊-阅读的乐趣" | gocc
#閱坊-閱讀的樂趣

Conversions 语言转换

目前支持14种

s2t, t2s, s2tw, tw2s, s2hk, hk2s, s2twp, tw2sp, t2tw, hk2t, t2hk, t2jp, jp2t, tw2t

  1. s2t ==> Simplified Chinese to Traditional Chinese 簡體到繁體
  2. t2s ==> Traditional Chinese to Simplified Chinese 繁體到簡體
  3. s2tw ==> Simplified Chinese to Traditional Chinese (Taiwan Standard) 簡體到臺灣正體
  4. tw2s ==> Traditional Chinese (Taiwan Standard) to Simplified Chinese 臺灣正體到簡體
  5. s2hk ==> Simplified Chinese to Traditional Chinese (Hong Kong variant) 簡體到香港繁體
  6. hk2s ==> Traditional Chinese (Hong Kong variant) to Simplified Chinese 香港繁體到簡體
  7. s2twp ==> Simplified Chinese to Traditional Chinese (Taiwan Standard) with Taiwanese idiom 簡體到繁體(臺灣正體標準)並轉換爲臺灣常用詞彙
  8. tw2sp ==> Traditional Chinese (Taiwan Standard) to Simplified Chinese with Mainland Chinese idiom 繁體(臺灣正體標準)到簡體並轉換爲中國大陸常用詞彙
  9. t2tw ==> Traditional Chinese (OpenCC Standard) to Taiwan Standard 繁體(OpenCC 標準)到臺灣正體
  10. hk2t ==> Traditional Chinese (Hong Kong variant) to Traditional Chinese 香港繁體到繁體(OpenCC 標準)
  11. t2hk ==> Traditional Chinese (OpenCC Standard) to Hong Kong variant 繁體(OpenCC 標準)到香港繁體
  12. t2jp ==> Traditional Chinese Characters (Kyūjitai) to New Japanese Kanji (Shinjitai) 繁體(OpenCC 標準,舊字體)到日文新字體
  13. jp2t ==> New Japanese Kanji (Shinjitai) to Traditional Chinese Characters (Kyūjitai) 日文新字體到繁體(OpenCC 標準,舊字體)
  14. tw2t ==> Traditional Chinese (Taiwan standard) to Traditional Chinese 臺灣正體到繁體(OpenCC 標準)

Updates 使用最新词典

更新词典文件

获取最新OpenCC代码 使用 OpenCC/data/config/*.jsonOpenCC/data/dictionary/*.txt 替换本包的 config/*.jsondictionary/*.txt 相关文件

OpenCC/data/config/*.json文件中 默认匹配的是.ocd2文件("type": "ocd2", "file": "TSPhrases.ocd2"),全部替换为txt即可

生成最新词典文件

以下文件由部分词典文件进一步操作产生, 需要手动处理或使用 OpenCC 脚本处理

  • HKVariantsRev.txtHKVariants.txt 反转列产生
  • JPVariantsRev.txtJPVariants.txt 反转列产生
  • TWPhrases.txtTWPhrasesIT.txt TWPhrasesName.txt TWPhrasesOther.txt 合并产生
  • TWPhrasesRev.txtTWPhrases.txt 反转列产生
  • TWVariantsRev.txtTWVariants.txt 反转列产生
添加新的语言包

如果有新添加的语言, 修改 opencc.go 文件中 supportedConversions conversions 值, 同时增加相关词典文件即可:

	supportedConversions = "s2t, t2s, s2tw, tw2s, s2hk, hk2s, s2twp, tw2sp, t2tw, hk2t, t2hk, t2jp, jp2t, tw2t"
    ......
	conversions          = map[string]struct{}{
		"s2t":   {},
		"t2s":   {},
		"s2tw":  {},
		"tw2s":  {},
		"s2hk":  {},
		"hk2s":  {},
		"s2twp": {},
		"tw2sp": {},
		"t2tw":  {},
		"hk2t":  {},
		"t2hk":  {},
		"t2jp":  {},
		"jp2t":  {},
		"tw2t":  {},
	}

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Dict

type Dict struct {
	Trie   *cedar.Cedar
	Values [][]string
}

Dict contains the Trie and dict values

func Build

func Build(in io.Reader) (*Dict, error)

Build da dict from io.Reader 从io.Reader构建数据

func BuildFromFile

func BuildFromFile(fileName string) (*Dict, error)

BuildFromFile builds the da dict from fileName BuildFromFile从fileName生成数据

func Load

func Load(dir string) (*Dict, error)

Load gob serialized dict from dir 从目录加载gob序列化的dict

func (*Dict) Get

func (d *Dict) Get(str string) ([]string, error)

Get the values of str, like map 获取str的值,例如map

func (*Dict) PrefixMatch

func (d *Dict) PrefixMatch(str string) (map[string][]string, error)

PrefixMatch str by Dict, returns the matched string and its according values Dict的PrefixMatch str,返回匹配的字符串及其相应的值

func (*Dict) Save

func (d *Dict) Save(dir string) error

Save gob serialized dict to dir 保存Gob序列化字典到目录

type Group

type Group struct {
	Files []string
	Dicts []*Dict
}

Group holds a sequence of dicts

func (*Group) String

func (g *Group) String() string

type OpenCC

type OpenCC struct {
	Conversion  string   // 14种转换方式 简写 s2t, t2s, s2tw, tw2s, s2hk, hk2s, s2twp, tw2sp, t2tw, hk2t, t2hk, t2jp, jp2t, tw2t
	Description string   // 选择的config/*.json文件中的方式名称 eg: s2t.json中 "name": "Simplified Chinese to Traditional Chinese",
	DictChains  []*Group // 解析config/*.json文件中的group 将字典文件名称放入切片中 eg: s2t.json中 STPhrases.txt STCharacters.txt
	// contains filtered or unexported fields
}

OpenCC contains the converter

func New

func New(conversion string, opts ...Option) (*OpenCC, error)

New construct an instance of OpenCC. Supported conversions: s2t, t2s, s2tw, tw2s, s2hk, hk2s, s2twp, tw2sp, t2tw, hk2t, t2hk, t2jp, jp2t, tw2t

func (*OpenCC) Convert

func (cc *OpenCC) Convert(in string) (string, error)

Convert string from Simplified Chinese to Traditional Chinese or vice versa 将字符串从简体中文转换为繁体中文,反之亦然 外部调用的文本转换方法

type Option added in v0.2.0

type Option func(*OpenCC)

Option OpenCC 配置项

func WithDir added in v0.2.0

func WithDir(dir string) Option

WithDir 词典数据主目录

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL