Documentation ¶
Index ¶
- type Config
- type LoadImageOptions
- type ParseImageOptions
- type Pool
- type PoolConfig
- type Tesseract
- func (t *Tesseract) ClearImage(ctx context.Context) error
- func (t *Tesseract) Close(ctx context.Context) error
- func (t *Tesseract) GetHOCR(ctx context.Context, progressCB func(int32)) (string, error)
- func (t *Tesseract) GetText(ctx context.Context, progressCB func(int32)) (string, error)
- func (t *Tesseract) LoadImage(ctx context.Context, img io.Reader, opts LoadImageOptions) error
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Config ¶
type Config struct { wasm.CompileConfig // Languages Tesseract scans for. Defaults to "eng". Language string // Training Data Tesseract uses. Required. Must support the provided language. https://github.com/tesseract-ocr/tessdata_fast for more details. TrainingData io.Reader // Variables are optionally passed into Tesseract as variable config options. Some options are listed at http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version Variables map[string]string // WASMCache is an optional wazero.CompilationCache used for running multiple Tesseract instances more efficiently. WASMCache wazero.CompilationCache }
type LoadImageOptions ¶
type LoadImageOptions struct { // RemoveUnderlines uses Leptonica (C img lib) to remove the underlines from the given image. Copies a lot. RemoveUnderlines bool }
type ParseImageOptions ¶
type ParseImageOptions struct { LoadImageOptions // IsHOCR makes a GetHOCR request instead of the default GetText IsHOCR bool // Called whenever Tesseract's parsing progresses, gives a percentage. ProgressCB func(int32) }
type Pool ¶
type Pool struct {
// contains filtered or unexported fields
}
func (*Pool) Close ¶
func (p *Pool) Close()
Close shuts down the Pool, Close's the Tesseract workers, and waits for the goroutines to end.
func (*Pool) ParseImage ¶
func (p *Pool) ParseImage(ctx context.Context, img io.Reader, opts ParseImageOptions) (string, error)
ParseImage loads an image into our Tesseract object and gets back text from it. Both actions are executed on an available worker. Set a timeout with context.WithTimeout to handle the case where all workers are busy.
type PoolConfig ¶
type PoolConfig struct { Config // TrainingDataBytes is Config.TrainingData, but as a []byte for concurrency's sake. // Multiple Tesseract workers can't read from a single io.Reader, so they can't benefit from streaming the data. // For convenience you only need to set either Config.TrainingData or TrainingDataBytes. TrainingDataBytes []byte }
type Tesseract ¶
type Tesseract struct {
// contains filtered or unexported fields
}
func New ¶
New creates a new Tesseract class that is ready for use. The Tesseract WASM is initialized with the given trainingdata, language and variable options. Each Tesseract object is NOT safe for concurrent use.
func (*Tesseract) ClearImage ¶
ClearImage clears the image from within Tesseract. LoadImage calls this for you.
func (*Tesseract) GetHOCR ¶
GetHOCR parses a previously loaded image for HOCR text. progressCB is called with a percentage for tracking Tesseract's recognition progress.
func (*Tesseract) GetText ¶
GetText parses a previously loaded image for text. progressCB is called with a percentage for tracking Tesseract's recognition progress.
func (*Tesseract) LoadImage ¶
LoadImage clears any previously loaded images, and loads the provided img into Tesseract WASM for parsing. Unfortunately the image is fully copied to memory a few times. Leptonica parses it into a Pix object and Tesseract copies that Pix object internally. Keep that in mind when working with large images.