Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ErrorLanguageNotFound = errors.New("laguage configuration not found")
ErrorLanguageNotFound is the error that is returned if a language configuration cannot be found.
Functions ¶
This section is empty.
Types ¶
type Candidate ¶
type Candidate struct { Suggestion string // Correction suggestion Modern string // Modern variant Dict string // Name of the used dictionary HistPatterns []Pattern // List of historical patterns OCRPatterns []Pattern // List of OCR error patterns Distance int // Levenshtein distance Weight float32 // The vote weight of the candidate }
Candidate represents a correction candidate for an OCR token.
func MakeCandidate ¶ added in v0.8.0
theyl@theil:{teil+[(t:th,0)]}+ocr[(i:y,3)],voteWeight=0.749764,levDistance=1,dict=dict_modern_hypothetic_error
type Interpretation ¶
Interpretation holds the list of candiates for OCR tokens. In the case of lexicon entries, an interpretation holds only one candidate with empty historical and and ocr pattern list.
type LanguageConfiguration ¶
type LanguageConfiguration struct {
Language, Path string
}
LanguageConfiguration represents a pair that consists of a language name and the according config path in the backend directory.
func FindLanguage ¶
func FindLanguage(backend, language string) (LanguageConfiguration, error)
FindLanguage searches the backend directory for a language configuration. It returns ErrorLanguageNotFound if the language configuration cannot be found.
func ListLanguages ¶
func ListLanguages(backend string) ([]LanguageConfiguration, error)
ListLanguages returns a list of language configurations in the given backend directory.
type Logger ¶
type Logger interface {
Log(string)
}
Logger defines a simple interface for the stderr logger of the profiling.
type Pattern ¶
type Pattern struct { Left string // Left part of the pattern Right string // Right part of the pattern Prob float64 // Global probability of the pattern Pos int // Position }
Pattern represents error patterns in strings. Left represents the `true` pattern(either the error correction or the modern form) and Right the actual pattern in the string at position Pos.
func MakePattern ¶ added in v0.8.0
MakePattern creates a pattern from a pattern expression `(left:right,pos)`.
type Profile ¶
type Profile map[string]Interpretation
Profile maps unkown OCR token in a profiled document to the according interpreations of the profiler.
func (Profile) GlobalHistPatterns ¶ added in v0.5.0
GlobalHistPatterns returns all global historical patterns with their according probabilities.
func (Profile) GlobalOCRPatterns ¶ added in v0.5.0
GlobalOCRPatterns returns all global ocr error patterns with their according probabilities.
type Profiler ¶ added in v0.2.0
type Profiler struct {
Exe, Config string
Log Logger
Types, Adaptive bool
// contains filtered or unexported fields
}
Profiler is a profiler executable with an optional logger and some minor options.
func (*Profiler) Run ¶ added in v0.2.0
Run profiles a list of tokens. It uses the given executable with the given language configuration. The optional logger is used to write the process's stderr.
func (*Profiler) RunFunc ¶ added in v0.8.0
func (p *Profiler) RunFunc(ctx context.Context, tokens []Token, f func(string, Candidate) error) error
RunFunc profiles a list of tokens. It uses the given language configuration. The optional logger is used to write the process's stderr. The callback function is called for every Profiler suggestion.
type Token ¶
type Token struct {
LE, OCR, COR string
}
Token represents an input token for the profiling. A token either contains an entry for the extended lexicon (LE) or a text token (OCR) with an optional manual correction (COR).
Tokens must never contain any whitespace in any of the strings.
func (Token) String ¶
String implements the io.Stringer interface. The output is suitable as direct input for the profiler, i.e each lexicon entry start with `#` all other tokens contain exactly on `:` to seperate the ocr token from the correction token. Tokens with no correction still must end with `:` (they contain an empty correction string).