ichiran

package module
v0.0.0-...-96ed9cf Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 19, 2025 License: GPL-3.0 Imports: 18 Imported by: 1

README

Status: alpha Go Reference Go Report Card

A Go library for Japanese text analysis using the Ichiran morphological analyzer in Docker compose containers. This client provides easy access to Japanese language parsing, including readings, translations, and grammatical analysis.

Features

  • Morphological analysis of Japanese text
  • Kanji readings and translations
  • Romaji (romanization) support
  • Part-of-speech tagging
  • Conjugation analysis
  • Download & manage the docker containers directly using the Docker Compose Go API

Installation

go get github.com/tassa-yoniso-manasi-karoto/go-ichiran

tldr

func main() {
	// Initialize Docker client with default configuration
	docker, err := ichiran.NewDocker()
	if err != nil {
		panic(err)
	}
	defer docker.Close()

	// Initialize the environment (downloads, builds and starts containers if they are not running)
	if err := docker.Init(); err != nil {
		panic(err)
	}

	tokens, err := ichiran.Analyze("私は日本語を勉強しています。")
	if err != nil {
		panic(err)
	}

	fmt.Printf("Tokenized: %#v\n",		tokens.Tokenized())
	fmt.Printf("TokenizedParts: %#v\n",	tokens.TokenizedParts())
	fmt.Printf("Kana: %#v\n",		tokens.Kana())
	fmt.Printf("KanaParts: %#v\n",		tokens.KanaParts())
	fmt.Printf("Roman: %#v\n",		tokens.Roman())
	fmt.Printf("RomanParts: %#v\n",		tokens.RomanParts())
	fmt.Printf("GlossParts: %#v\n",		tokens.GlossParts())
}

Output

Tokenized: "私 は 日本語 を 勉強しています . "
TokenizedParts: []string{"私", "は", "日本語", "を", "勉強しています", ". "}
Kana: "わたし は にほんご を べんきょう しています . "
KanaParts: []string{"わたし", "は", "にほんご", "を", "べんきょう しています", ". "}
Roman: "watashi wa nihongo wo benkyō shiteimasu . "
RomanParts: []string{"watashi", "wa", "nihongo", "wo", "benkyō shiteimasu", ". "}
GlossParts: []string{"私(I; me)",
	"は (indicates sentence topic; indicates contrast with another option (stated or unstated); adds emphasis)",
	"日本語 (Japanese (language))",
	"を (indicates direct object of action; indicates subject of causative expression; indicates an area traversed; indicates time (period) over which action takes place; indicates point of departure or separation of action; indicates object of desire, like, hate, etc.)",
	"勉強 (study; diligence; working hard; experience; lesson (for the future); discount; price reduction)",
	"して (to do; to carry out; to perform; to cause to become; to make (into); to turn (into); to serve as; to act as; to work as; to wear (clothes, a facial expression, etc.); to judge as being; to view as being; to think of as; to treat as; to use as; to decide on; to choose; to be sensed (of a smell, noise, etc.); to be (in a state, condition, etc.); to be worth; to cost; to pass (of time); to elapse; to place, or raise, person A to a post or status B; to transform A to B; to make A into B; to exchange A for B; to make use of A for B; to view A as B; to handle A as if it were B; to feel A about B; verbalizing suffix (applies to nouns noted in this dictionary with the part of speech \"vs\"); creates a humble verb (after a noun prefixed with \"o\" or \"go\"); to be just about to; to be just starting to; to try to; to attempt to)",
	"います (to be (of animate objects); to exist; to stay; to be ...-ing; to have been ...-ing)",
	". "}

[!TIP] if you have 'exec: "ichiran-cli": executable file not found' errors, remove directory ./docker/pgdata (as recommended by README of ichiran repo) at location below and use docker.InitForce() to bypass cache and force rebuild from scratch.

Docker compose containers' location

  • Linux: ~/.config/ichiran
  • macOS: ~/Library/Application Support/ichiran
  • Windows: %LOCALAPPDATA%\ichiran

Requirements

[!IMPORTANT] The Docker library in Go is not standalone - it requires a running Docker daemon: Docker Desktop (Windows/Mac) or Docker Engine (Linux) must be installed and running for this library to work.

Windows

  1. Docker Desktop for Windows

    • Download and install from Docker Hub
    • Requires Windows 10/11 Pro, Enterprise, or Education (64-bit)
    • WSL 2 backend is recommended
    • Hardware requirements:
      • 64-bit processor with Second Level Address Translation (SLAT)
      • 4GB system RAM
      • BIOS-level hardware virtualization support must be enabled
  2. WSL 2 (Windows Subsystem for Linux)

    • Required for best performance
    • Install via PowerShell (as administrator):
      wsl --install
      
    • Restart your computer after installation
  3. System Requirements

    • Go 1.19 or later
    • Internet connection (for initial setup)

macOS

  1. Docker Desktop for Mac

    • Download and install from Docker Hub
    • Compatible with macOS 10.15 or newer
  2. System Requirements

    • Go 1.19 or later
    • Internet connection (for initial setup)

Linux

  1. Docker Engine

    • Install using your distribution's package manager
    • Docker Compose V2 (included with recent Docker Engine installations)
    # Ubuntu/Debian
    sudo apt-get update
    sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
    
  2. System Requirements

    • Go 1.19 or later
    • Internet connection (for initial setup)

Post-Installation

  1. Verify Docker installation:

    docker --version
    docker compose version
    
  2. Start Docker service (if not started):

    # Windows/Mac: Start Docker Desktop
    # Linux:
    sudo systemctl start docker
    
  3. (Optional) Configure non-root access on Linux:

    sudo usermod -aG docker $USER
    # Log out and back in for changes to take effect
    

Alternatives

License

GPL3

Documentation

Index

Constants

View Source
const (
	ContainerName = "ichiran-main-1"
)

Variables

View Source
var (
	QueryTO = 1 * time.Hour
)

Functions

This section is empty.

Types

type Conj

type Conj struct {
	Prop    []Prop  `json:"prop"`    // Conjugation properties
	Reading string  `json:"reading"` // Base form reading
	Gloss   []Gloss `json:"gloss"`   // Base form meanings
	ReadOk  bool    `json:"readok"`  // Reading validity flag
}

Conj represents conjugation information

type Docker

type Docker struct {
	// contains filtered or unexported fields
}

func NewDocker

func NewDocker() (*Docker, error)

NewDocker creates or returns an existing Docker instance

func (*Docker) Close

func (i *Docker) Close() error

Close implements io.Closer

func (*Docker) Init

func (i *Docker) Init() error

Init initializes the ichiran service

func (*Docker) InitForce

func (i *Docker) InitForce() error

InitForce initializes the ichiran service with forced rebuild

func (*Docker) InitQuiet

func (i *Docker) InitQuiet() error

InitQuiet initializes the ichiran service with reduced logging

func (*Docker) SetLogLevel

func (i *Docker) SetLogLevel(level zerolog.Level)

SetLogLevel updates the logging level

func (*Docker) Status

func (i *Docker) Status() (string, error)

Status returns the current status of the ichiran service

func (*Docker) Stop

func (i *Docker) Stop() error

Stop stops the ichiran service

type Gloss

type Gloss struct {
	Pos   string `json:"pos"`   // Part of speech
	Gloss string `json:"gloss"` // English meaning
	Info  string `json:"info"`  // Additional information
}

Gloss represents the English glosses and part of speech

type JSONToken

type JSONToken struct {
	Surface     string      `json:"text"` // Original text
	IsToken     bool        // Whether this is a Japanese token or non-Japanese text
	Reading     string      `json:"reading"` // Reading with kanji and kana
	Kana        string      `json:"kana"`    // Kana reading
	Romaji      string      // Romanized form from ichiran
	Score       int         `json:"score"`          // Analysis score
	Seq         int         `json:"seq"`            // Sequence number
	Gloss       []Gloss     `json:"gloss"`          // English meanings
	Conj        []Conj      `json:"conj,omitempty"` // Conjugation information
	Alternative []JSONToken `json:"alternative"`    // Alternative interpretations
	Compound    []string    `json:"compound"`       // Delineable elements of compound expressions
	Components  []JSONToken `json:"components"`     // Details of delineable elements of compound expressions
	Raw         []byte      `json:"-"`              // Raw JSON for future processing
}

JSONToken represents a single token with all its analysis information

type JSONTokens

type JSONTokens []*JSONToken

JSONTokens is a slice of token pointers representing a complete analysis result.

func Analyze

func Analyze(text string) (*JSONTokens, error)

Analyze performs morphological analysis on the input Japanese text. Returns parsed tokens or an error if analysis fails. Analyze performs Japanese text analysis using ichiran

func (JSONTokens) Gloss

func (tokens JSONTokens) Gloss() string

Gloss returns a formatted string containing tokens and their English glosses including morphemes and alternative interpretations.

func (JSONTokens) GlossParts

func (tokens JSONTokens) GlossParts() (parts []string)

GlossParts returns a slice of strings containing tokens and their English glosses, including morphemes and alternative interpretations.

func (JSONTokens) Kana

func (tokens JSONTokens) Kana() string

Kana returns a string of all tokens in kana form where available.

func (JSONTokens) KanaParts

func (tokens JSONTokens) KanaParts() (parts []string)

KanaParts returns a slice of all tokens in kana form where available.

func (JSONTokens) Roman

func (tokens JSONTokens) Roman() string

Roman returns a string of all tokens in romanized form.

func (JSONTokens) RomanParts

func (tokens JSONTokens) RomanParts() (parts []string)

RomanParts returns a slice of all tokens in romanized form.

func (JSONTokens) ToMorphemes

func (tokens JSONTokens) ToMorphemes() JSONTokens

ToMorphemes returns a new slice of tokens where compound tokens are replaced by their constituent morphemes

func (JSONTokens) Tokenized

func (tokens JSONTokens) Tokenized() string

TokenizedStr returns a string of all tokens separated by spaces or commas.

func (JSONTokens) TokenizedParts

func (tokens JSONTokens) TokenizedParts() (parts []string)

TokenizedParts returns a slice of all token surfaces.

type Prop

type Prop struct {
	Pos  string `json:"pos"`  // Part of speech
	Type string `json:"type"` // Type of conjugation
	Neg  bool   `json:"neg"`  // Negation flag
}

Prop represents grammatical properties

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL