ichiran

package module

v0.0.0-...-96ed9cf Latest Latest Go to latest Published: Jan 19, 2025 License: GPL-3.0 Imports: 18 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/tassa-yoniso-manasi-karoto/go-ichiran

README ¶

Status: alpha

A Go library for Japanese text analysis using the Ichiran morphological analyzer in Docker compose containers. This client provides easy access to Japanese language parsing, including readings, translations, and grammatical analysis.

Features

Morphological analysis of Japanese text
Kanji readings and translations
Romaji (romanization) support
Part-of-speech tagging
Conjugation analysis
Download & manage the docker containers directly using the Docker Compose Go API

Installation

go get github.com/tassa-yoniso-manasi-karoto/go-ichiran

tldr

func main() {
	// Initialize Docker client with default configuration
	docker, err := ichiran.NewDocker()
	if err != nil {
		panic(err)
	}
	defer docker.Close()

	// Initialize the environment (downloads, builds and starts containers if they are not running)
	if err := docker.Init(); err != nil {
		panic(err)
	}

	tokens, err := ichiran.Analyze("私は日本語を勉強しています。")
	if err != nil {
		panic(err)
	}

	fmt.Printf("Tokenized: %#v\n",		tokens.Tokenized())
	fmt.Printf("TokenizedParts: %#v\n",	tokens.TokenizedParts())
	fmt.Printf("Kana: %#v\n",		tokens.Kana())
	fmt.Printf("KanaParts: %#v\n",		tokens.KanaParts())
	fmt.Printf("Roman: %#v\n",		tokens.Roman())
	fmt.Printf("RomanParts: %#v\n",		tokens.RomanParts())
	fmt.Printf("GlossParts: %#v\n",		tokens.GlossParts())
}

Output

Tokenized: "私 は 日本語 を 勉強しています . "
TokenizedParts: []string{"私", "は", "日本語", "を", "勉強しています", ". "}
Kana: "わたし は にほんご を べんきょう しています . "
KanaParts: []string{"わたし", "は", "にほんご", "を", "べんきょう しています", ". "}
Roman: "watashi wa nihongo wo benkyō shiteimasu . "
RomanParts: []string{"watashi", "wa", "nihongo", "wo", "benkyō shiteimasu", ". "}
GlossParts: []string{"私(I; me)",
	"は (indicates sentence topic; indicates contrast with another option (stated or unstated); adds emphasis)",
	"日本語 (Japanese (language))",
	"を (indicates direct object of action; indicates subject of causative expression; indicates an area traversed; indicates time (period) over which action takes place; indicates point of departure or separation of action; indicates object of desire, like, hate, etc.)",
	"勉強 (study; diligence; working hard; experience; lesson (for the future); discount; price reduction)",
	"して (to do; to carry out; to perform; to cause to become; to make (into); to turn (into); to serve as; to act as; to work as; to wear (clothes, a facial expression, etc.); to judge as being; to view as being; to think of as; to treat as; to use as; to decide on; to choose; to be sensed (of a smell, noise, etc.); to be (in a state, condition, etc.); to be worth; to cost; to pass (of time); to elapse; to place, or raise, person A to a post or status B; to transform A to B; to make A into B; to exchange A for B; to make use of A for B; to view A as B; to handle A as if it were B; to feel A about B; verbalizing suffix (applies to nouns noted in this dictionary with the part of speech \"vs\"); creates a humble verb (after a noun prefixed with \"o\" or \"go\"); to be just about to; to be just starting to; to try to; to attempt to)",
	"います (to be (of animate objects); to exist; to stay; to be ...-ing; to have been ...-ing)",
	". "}

[!TIP] if you have 'exec: "ichiran-cli": executable file not found' errors, remove directory ./docker/pgdata (as recommended by README of ichiran repo) at location below and use docker.InitForce() to bypass cache and force rebuild from scratch.

Docker compose containers' location

Linux: ~/.config/ichiran
macOS: ~/Library/Application Support/ichiran
Windows: %LOCALAPPDATA%\ichiran

Requirements

[!IMPORTANT] The Docker library in Go is not standalone - it requires a running Docker daemon: Docker Desktop (Windows/Mac) or Docker Engine (Linux) must be installed and running for this library to work.

Windows

Docker Desktop for Windows
- Download and install from Docker Hub
- Requires Windows 10/11 Pro, Enterprise, or Education (64-bit)
- WSL 2 backend is recommended
- Hardware requirements:
  - 64-bit processor with Second Level Address Translation (SLAT)
  - 4GB system RAM
  - BIOS-level hardware virtualization support must be enabled
WSL 2 (Windows Subsystem for Linux)
- Required for best performance
- Install via PowerShell (as administrator):
```
wsl --install
```
- Restart your computer after installation
System Requirements
- Go 1.19 or later
- Internet connection (for initial setup)

macOS

Docker Desktop for Mac
- Download and install from Docker Hub
- Compatible with macOS 10.15 or newer
System Requirements
- Go 1.19 or later
- Internet connection (for initial setup)

Linux

Docker Engine

Install using your distribution's package manager
Docker Compose V2 (included with recent Docker Engine installations)

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

System Requirements
- Go 1.19 or later
- Internet connection (for initial setup)

Post-Installation

Verify Docker installation:

docker --version
docker compose version

Start Docker service (if not started):

# Windows/Mac: Start Docker Desktop
# Linux:
sudo systemctl start docker

(Optional) Configure non-root access on Linux:

sudo usermod -aG docker $USER
# Log out and back in for changes to take effect

Alternatives

ikawaha/kagome: self-contained Japanese Morphological Analyzer written in pure Go
shogo82148/go-mecab: MeCab binding for Golang

License

GPL3

Documentation ¶

Index ¶

Constants
Variables
type Conj
type Docker
- func NewDocker() (*Docker, error)
type Gloss
type JSONToken
type JSONTokens
- func Analyze(text string) (*JSONTokens, error)
type Prop

Constants ¶

View Source

const (
	ContainerName = "ichiran-main-1"
)

Variables ¶

View Source

var (
	QueryTO = 1 * time.Hour
)

Functions ¶

This section is empty.

Types ¶

type Conj ¶

type Conj struct {
	Prop    []Prop  `json:"prop"`    // Conjugation properties
	Reading string  `json:"reading"` // Base form reading
	Gloss   []Gloss `json:"gloss"`   // Base form meanings
	ReadOk  bool    `json:"readok"`  // Reading validity flag
}

Conj represents conjugation information

type Docker ¶

type Docker struct {
	// contains filtered or unexported fields
}

func NewDocker ¶

func NewDocker() (*Docker, error)

NewDocker creates or returns an existing Docker instance

func (*Docker) Close ¶

func (i *Docker) Close() error

Close implements io.Closer

func (*Docker) Init ¶

func (i *Docker) Init() error

Init initializes the ichiran service

func (*Docker) InitForce ¶

func (i *Docker) InitForce() error

InitForce initializes the ichiran service with forced rebuild

func (*Docker) InitQuiet ¶

func (i *Docker) InitQuiet() error

InitQuiet initializes the ichiran service with reduced logging

func (*Docker) SetLogLevel ¶

func (i *Docker) SetLogLevel(level zerolog.Level)

SetLogLevel updates the logging level

func (*Docker) Status ¶

func (i *Docker) Status() (string, error)

Status returns the current status of the ichiran service

func (*Docker) Stop ¶

func (i *Docker) Stop() error

Stop stops the ichiran service

type Gloss ¶

type Gloss struct {
	Pos   string `json:"pos"`   // Part of speech
	Gloss string `json:"gloss"` // English meaning
	Info  string `json:"info"`  // Additional information
}

Gloss represents the English glosses and part of speech

type JSONToken ¶

type JSONToken struct {
	Surface     string      `json:"text"` // Original text
	IsToken     bool        // Whether this is a Japanese token or non-Japanese text
	Reading     string      `json:"reading"` // Reading with kanji and kana
	Kana        string      `json:"kana"`    // Kana reading
	Romaji      string      // Romanized form from ichiran
	Score       int         `json:"score"`          // Analysis score
	Seq         int         `json:"seq"`            // Sequence number
	Gloss       []Gloss     `json:"gloss"`          // English meanings
	Conj        []Conj      `json:"conj,omitempty"` // Conjugation information
	Alternative []JSONToken `json:"alternative"`    // Alternative interpretations
	Compound    []string    `json:"compound"`       // Delineable elements of compound expressions
	Components  []JSONToken `json:"components"`     // Details of delineable elements of compound expressions
	Raw         []byte      `json:"-"`              // Raw JSON for future processing
}

JSONToken represents a single token with all its analysis information

type JSONTokens ¶

type JSONTokens []*JSONToken

JSONTokens is a slice of token pointers representing a complete analysis result.

func Analyze ¶

func Analyze(text string) (*JSONTokens, error)

Analyze performs morphological analysis on the input Japanese text. Returns parsed tokens or an error if analysis fails. Analyze performs Japanese text analysis using ichiran

func (JSONTokens) Gloss ¶

func (tokens JSONTokens) Gloss() string

Gloss returns a formatted string containing tokens and their English glosses including morphemes and alternative interpretations.

func (JSONTokens) GlossParts ¶

func (tokens JSONTokens) GlossParts() (parts []string)

GlossParts returns a slice of strings containing tokens and their English glosses, including morphemes and alternative interpretations.

func (JSONTokens) Kana ¶

func (tokens JSONTokens) Kana() string

Kana returns a string of all tokens in kana form where available.

func (JSONTokens) KanaParts ¶

func (tokens JSONTokens) KanaParts() (parts []string)

KanaParts returns a slice of all tokens in kana form where available.

func (JSONTokens) Roman ¶

func (tokens JSONTokens) Roman() string

Roman returns a string of all tokens in romanized form.

func (JSONTokens) RomanParts ¶

func (tokens JSONTokens) RomanParts() (parts []string)

RomanParts returns a slice of all tokens in romanized form.

func (JSONTokens) ToMorphemes ¶

func (tokens JSONTokens) ToMorphemes() JSONTokens

ToMorphemes returns a new slice of tokens where compound tokens are replaced by their constituent morphemes

func (JSONTokens) Tokenized ¶

func (tokens JSONTokens) Tokenized() string

TokenizedStr returns a string of all tokens separated by spaces or commas.

func (JSONTokens) TokenizedParts ¶

func (tokens JSONTokens) TokenizedParts() (parts []string)

TokenizedParts returns a slice of all token surfaces.

type Prop ¶

type Prop struct {
	Pos  string `json:"pos"`  // Part of speech
	Type string `json:"type"` // Type of conjugation
	Neg  bool   `json:"neg"`  // Negation flag
}

Prop represents grammatical properties

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL