dragoman

package module
v0.9.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 26, 2024 License: MIT Imports: 8 Imported by: 0

README

Dragoman - Translator for Structured Documents

PkgGoDev Test

Dragoman is an AI-powered tool for translating structured documents like JSON, XML, YAML. The tool's key feature is its ability to maintain the document's structure during translation - keeping elements such as JSON keys and placeholders intact.

Dragoman is available as both a CLI tool and a Go library. This means you can use it directly from your terminal for one-off tasks, or integrate it into your Go applications for more complex use cases.

If you're looking for a version of Dragoman that leverages conventional translation services like Google Translate or DeepL, check out the freeze branch of this repository. The previous implementation manually extracted texts from the input files, translated them using DeepL or Google Translate, and reinserted the translated pieces back into the original documents.

Installation

Dragoman can be installed directly using Go's built-in package manager:

go install github.com/modernice/dragoman/cmd/dragoman@latest

To add Dragoman to your Go project, install using go get:

go get github.com/modernice/dragoman

Usage

The basic usage of Dragoman is as follows:

dragoman source.json

This command will translate the content of source.json to English and print the translated document to stdout. The source language is automatically detected by default, but if you want to specify the source or target languages, you need to use the --from or --to option.

Full list of available options

-f or --from

The source language of the document. It can be specified in any format that a human would understand (like 'English', 'German', 'French', etc.). If not provided, it defaults to 'auto', meaning the language is automatically detected.

dragoman translate source.json --from English

-t or --to

The target language to which the document will be translated. It can be specified in any format that a human would understand (like 'English', 'German', 'French', etc.). If not provided, it defaults to 'English'.

dragoman translate source.json --to French

-o or --out

The path to the output file where the translated content will be saved. If this option is not provided, the translated content will be printed to stdout.

dragoman translate source.json --out target.json

--split-chunks

Split the source document into chunks before translating. This can help to fit the documents into the context size of OpenAI's models. Each line that starts with one of the provided prefixes will create a new chunk.

Example: Split a Markdown file into chunks when encountering H2 and H3 headings:

dragoman translate source.json --split-chunks "## " --split-chunks "### "

-u or --update

Enable this option to only translate missing fields from the source file that are missing in the output file. This option requires the source and output files to be JSON!

dragoman translate source.json --out target.json --update
Example

When you add new translations to your JSON source file, you can use the --update option to only translate the newly added fields and merge them into the output file.

// en.json
{
	"hello": "Hello, world!",
	"contact": {
		"email": "hello@example.com",
		"response": "Thank you for your message."
	}
}
// de.json
{
	"hello": "Hallo, Welt!",
	"contact": {
		"email": "hallo@example.com"
	}
}
dragoman translate en.json --out de.json --update

Result:

// de.json
{
	"hello": "Hallo, Welt!",
	"contact": {
		"email": "hallo@example.com",
		"response": "Vielen Dank für deine Nachricht."
	}
}

-p or --preserve

This option allows you to specify a list of specific words or phrases, separated by commas, that you want to remain unchanged during the translation process. It's particularly useful for ensuring that certain terms, which may have significance in their original form or are used in specific contexts (like code, trademarks, or names), are not altered. These specified terms will be recognized and preserved whether they appear in isolation or as part of larger strings. This feature is especially handy for content that includes embedded terms within other elements, such as HTML tags. For instance, using --preserve ensures that a term like Dragoman retains its original form post-translation. Note that the effectiveness of this feature may vary depending on the language model used, and it is optimized for use with OpenAI's GPT models.

dragoman translate source.json --preserve Dragoman

-v or --verbose

A flag that, if provided, makes the CLI provide more detailed output about the process and result of the translation.

dragoman translate source.json --verbose

-h or --help

A flag that displays a help message detailing how to use the command and its options.

dragoman --help

Use as Library

Besides the CLI tool, Dragoman can also be used as a Go library in your own applications. This allows you to build the Dragoman translation capabilities directly into your own Go programs.

Example: Basic Translation

In this example, we load a JSON file and translate its content using the default source and target languages (automatic detection and English, respectively).

package main

import (
	"fmt"
	"io"

	"github.com/modernice/dragoman"
	"github.com/modernice/dragoman/openai"
)

func main() {
	content, _ := io.ReadFile("source.json")
	
	service := openai.New()
	translator := dragoman.New(service)
	
	translated, _ := translator.Translate(context.TODO(), string(content))

	fmt.Println(translated)
}
Example: Translation with Preserved Words

In this example, we translate a JSON file, specifying some preserved words that should not be translated.

package main

import (
	"fmt"
	"io"

	"github.com/modernice/dragoman"
	"github.com/modernice/dragoman/openai"
)

func main() {
	content, _ := io.ReadFile("source.json")
	
	service := openai.New()
	translator := dragoman.New(service)
	
	translated, _ := translator.Translate(
		context.TODO(),
		string(content),
		dragoman.Preserve([]string{"Dragoman", "OpenAI"}),
	)

	fmt.Println(translated)
}
Example: Translation with Specific Source and Target Languages

In this example, we translate a JSON file from English to French, specifying the source and target languages.

package main

import (
	"fmt"
	"io"

	"github.com/modernice/dragoman"
	"github.com/modernice/dragoman/openai"
)

func main() {
	content, _ := io.ReadFile("source.json")
	
	service := openai.New()
	translator := dragoman.New(service)
	
	translated, _ := translator.Translate(
		context.TODO(),
		string(content),
		dragoman.Source("English"),
		dragoman.Target("French"),
	)

	fmt.Println(translated)
}

License

MIT

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func JSONExtract added in v0.7.0

func JSONExtract[TData []byte | map[string]any](data TData, paths []JSONPath) (map[string]any, error)

JSONExtract extracts values from a JSON document according to specified paths and returns them as a map. It supports both raw JSON bytes and already-parsed maps as input. If any path does not exist or leads to an unexpected type, an error is returned alongside the partial output.

func JSONMerge added in v0.7.0

func JSONMerge(into map[string]any, from map[string]any)

JSONMerge combines the contents of two JSON object maps, where 'from' is merged into 'into'. If there are matching keys, the values from 'from' will overwrite those in 'into'. For nested maps, merging is performed recursively. This function modifies the 'into' map directly and does not return a new map.

func Version added in v0.5.10

func Version() string

Version returns the current version of the dragoman CLI.

Types

type Formality added in v0.9.0

type Formality string

Formality represents the level of formality in language, ranging from formal to informal, providing contextual cues for language translation or usage. It supports checking if a specific formality has been set and converts to its string representation. Formality also guides language adjustments based on the desired tone and social context.

const (
	// FormalityUnspecified indicates the absence of a specified formality level in
	// translation or language settings.
	FormalityUnspecified Formality = ""

	// FormalityFormal represents the use of formal language and address forms,
	// applicable across all languages where such distinctions exist.
	FormalityFormal Formality = "formal"

	// FormalityInformal specifies the use of informal language and address forms,
	// applicable across various languages where distinctions between formality
	// levels exist.
	FormalityInformal Formality = "informal"
)

func (Formality) IsSpecified added in v0.9.0

func (f Formality) IsSpecified() bool

IsSpecified reports whether a Formality instance has a specified value other than the default unspecified state.

func (Formality) String added in v0.9.0

func (f Formality) String() string

String returns the string representation of the formal language setting encapsulated by the Formality type.

type ImproveParams added in v0.9.0

type ImproveParams struct {
	Document string

	// SplitChunks is a list of strings that should be used to split the document
	// into chunks. If the document is split into chunks, each chunk will be
	// improved separately, allowing to fit large documents into the model's
	// context window.
	SplitChunks []string

	// Formality specifies the formality (formal address) to use in the improved document.
	Formality Formality

	// Keywords are SEO keywords that should be used in the improved document.
	Keywords []string

	// Instructions are raw instructions that should be included in the prompt.
	Instructions []string

	// Language is the language the improved document should be written in.
	Language string
}

ImproveParams configures the enhancement of a document by specifying its content, how to split it for processing, the desired formality tone, SEO keywords to incorporate, specific instructions for adjustment, and the language in which improvements should be made. It is used by an Improver to adjust a document's appeal, readability, and search engine optimization.

type Improver added in v0.9.0

type Improver struct {
	// contains filtered or unexported fields
}

Improver enhances the content of a document by making it more engaging, informative, and optimized for search engine visibility while preserving its structural integrity. It takes into account various parameters such as formality, language, and specific keywords to ensure the output is tailored to specific needs. The enhanced content is achieved by processing each segment of the document separately when necessary, allowing for large documents to be handled effectively.

func NewImprover added in v0.9.0

func NewImprover(svc Model) *Improver

NewImprover creates a new instance of Improver using the provided Model.

func (*Improver) Improve added in v0.9.0

func (imp *Improver) Improve(ctx context.Context, params ImproveParams) (string, error)

Improve enhances the content of a document based on specified parameters to increase engagement, clarity, and search engine optimization. It splits the document into manageable chunks if necessary, processes each chunk independently according to the improvement criteria including language, formality, keywords, and additional instructions, and then reassembles the improved chunks into a cohesive output.

type JSONPath added in v0.7.0

type JSONPath []string

JSONPath represents a sequence of keys that specify a unique path through a JSON object hierarchy, similar to an address for locating a specific value within a nested JSON structure. It is used to traverse and extract data from complex JSON documents.

func JSONDiff added in v0.7.0

func JSONDiff[TInput []byte | map[string]any](source, target TInput) ([]JSONPath, error)

JSONDiff identifies the differences between two JSON objects or two raw JSON byte representations. It returns a slice of JSONPaths that represent the hierarchical structure of keys where differences exist, and an error if any occur during the process. The function is generic and can accept either raw bytes or maps as inputs for comparison.

type Model

type Model interface {
	// Chat function takes a context and a prompt as input and returns a string and
	// an error. It uses the provided context and prompt to initiate a chat session
	// and retrieve a response.
	Chat(context.Context, string) (string, error)
}

Model is an interface that represents a chat-based translation model. It provides a method called Chat, which takes a context and a prompt string as input and returns the translated text and any error that occurred during translation.

type ModelFunc

type ModelFunc func(context.Context, string) (string, error)

ModelFunc is a type that represents a function that can be used as a model for chat translation. It implements the Model interface and allows for chat translation by calling the function with a context and prompt string.

func (ModelFunc) Chat

func (chat ModelFunc) Chat(ctx context.Context, prompt string) (string, error)

Chat is a function that initiates a conversation with the model to translate a document. It takes a context and a prompt as input parameters, and returns the translated document as a string along with any errors encountered.

type TranslateParams added in v0.9.0

type TranslateParams struct {
	Document string

	// Source is the language of the document to translate.
	Source string

	// Target is the language to translate the document to.
	Target string

	// Preserve is a list of terms that should not be translated. Useful for
	// preserving brand names.
	Preserve []string

	// Instructions are raw instructions that should be included in the prompt.
	Instructions []string

	SplitChunks []string
}

TranslateParams specifies the parameters for translating text from one language to another, including instructions on how text should be handled during translation and any terms that should be preserved unchanged. It also defines how to segment the text for translation if necessary.

type Translator

type Translator struct {
	// contains filtered or unexported fields
}

Translator provides facilities for converting text from one language to another while optionally preserving specific terms and adhering to additional translation instructions. It supports translating large documents by splitting them into manageable chunks based on specified delimiters. The process respects the contextual nuances of the source and target languages, ensuring that the structural integrity and formatting of the original document are maintained. Errors during the translation process are handled gracefully, providing detailed error messages that facilitate troubleshooting.

func NewTranslator added in v0.9.0

func NewTranslator(svc Model) *Translator

NewTranslator creates a new instance of a translator, initializing it with a provided model for language translation tasks. It returns a *Translator.

func (*Translator) Translate

func (t *Translator) Translate(ctx context.Context, params TranslateParams) (string, error)

Translate converts the content of a document from one language to another according to specified parameters. It processes the document in potentially multiple segments, preserving specified terms and formatting instructions. The function returns the translated text or an error if the translation fails. Input parameters and context are provided by a TranslateParams and context.Context, respectively.

Directories

Path Synopsis
cmd
internal
cli

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL