asticoqui

package module

v0.2.0 Latest Latest Go to latest Published: Aug 19, 2022 License: MIT Imports: 4 Imported by: 6

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/asticode/go-asticoqui

Links

Open Source Insights

README ¶

Golang bindings for Coqui's 🐸STT speech-to-text library.

asticoqui is compatible with version v1.0.0, v1.1.0, and v1.2.0 of 🐸STT.

Installation

Install tflite

Run the following command:

    $ pip3 install --extra-index-url https://google-coral.github.io/py-repo/ tflite_runtime

If you're interested in running against your CUDA-enabled GPU (optional), then set the environment variable STT_TFLITE_DELEGATE=gpu.

Install Coqui STT

fetch an up-to-date native_client.*.tar.xz matching your system from 🐸STT releases. For example, on macOS:

    $ wget https://github.com/coqui-ai/STT/releases/download/v1.2.0/native_client.tflite.macOS.tar.xz

extract its content to $HOME/.coqui/. For example, on macOS:

    $ mkdir $HOME/.coqui/
    $ tar -xvzf native_client.tflite.macOS.tar.xz -C $HOME/.coqui/

set environment variables to point to client

    $ export CGO_LDFLAGS="-L$HOME/.coqui/"
    $ export CGO_CXXFLAGS="-I$HOME/.coqui/"
    $ export LD_LIBRARY_PATH="$HOME/.coqui/:$LD_LIBRARY_PATH"

Install asticoqui

Install dependencies

Run the following command:

    $ go get -u github.com/asticode/go-asticoqui/...

Install executables

Run the following command:

    $ go install github.com/asticode/go-asticoqui/cmd

Example Usage

Get the pre-trained model and scorer

Go to this page and click Enter Email to Download at the bottom of the page. Download model.tflite and huge_vocabulary.scorer.

Get the audio files

Run the following commands:

    $ cd $HOME/.coqui
    $ wget https://github.com/coqui-ai/STT/releases/download/v1.2.0/audio-1.2.0.tar.gz
    $ tar -xvfz audio-1.2.0.tar.gz

Use this client

Run the following commands:

    $ go run coqui/main.go -model model.tflite -scorer huge_vocabulary.scorer -audio audio/2830-3980-0043.wav
    
        Text: experience proves this
    
    $ go run coqui/main.go -model model.tflite -scorer huge_vocabulary.scorer -audio audio/4507-16021-0012.wav
    
        Text: why should one hall on the way
        
    $ go run coqui/main.go -model model.tflite -scorer huge_vobaculary.scorer -audio audio/8455-210777-0068.wav
    
        Text: your power is sufficient i said

Documentation ¶

Index ¶

func Version() string
type CandidateTranscript
type Metadata
type Model
- func New(modelPath string) (*Model, error)
type Stream
type TokenMetadata

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Version ¶

func Version() string

Version returns the version of the C library. The returned version is a semantic version (SemVer 2.0.0).

Types ¶

type CandidateTranscript ¶

type CandidateTranscript C.struct_CandidateTranscript

CandidateTranscript is a single transcript computed by the model, including a confidence value and the metadata for its constituent tokens.

func (*CandidateTranscript) Confidence ¶

func (ct *CandidateTranscript) Confidence() float64

Confidence returns the approximated confidence value for this transcript. This is roughly the sum of the acoustic model logit values for each timestep/character that contributed to the creation of this transcript.

func (*CandidateTranscript) NumTokens ¶

func (ct *CandidateTranscript) NumTokens() uint

func (*CandidateTranscript) Tokens ¶

func (ct *CandidateTranscript) Tokens() []TokenMetadata

type Metadata ¶

type Metadata C.struct_Metadata

Metadata holds an array of CandidateTranscript objects computed by the model.

func (*Metadata) Close ¶

func (m *Metadata) Close()

Close frees the Metadata structure properly.

func (*Metadata) NumTranscripts ¶

func (m *Metadata) NumTranscripts() uint

func (*Metadata) Transcripts ¶

func (m *Metadata) Transcripts() []CandidateTranscript

type Model ¶

type Model struct {
	// contains filtered or unexported fields
}

Model provides an interface to a trained model.

func New ¶

func New(modelPath string) (*Model, error)

New creates a new Model. modelPath is the path to the frozen model graph.

func (*Model) BeamWidth ¶

func (m *Model) BeamWidth() uint

BeamWidth returns the beam width value used by the model. If SetModelBeamWidth was not called before, it will return the default value loaded from the model file.

func (*Model) Close ¶

func (m *Model) Close()

Close frees associated resources and destroys the model object.

func (*Model) DisableExternalScorer ¶

func (m *Model) DisableExternalScorer() error

DisableExternalScorer disables decoding using an external scorer.

func (*Model) EnableExternalScorer ¶

func (m *Model) EnableExternalScorer(scorerPath string) error

EnableExternalScorer enables decoding using an external scorer. scorerPath is the path to the external scorer file.

func (*Model) NewStream ¶

func (m *Model) NewStream() (*Stream, error)

NewStream creates a new streaming inference state. If an error is not returned, exactly one of the returned stream's Finish, FinishWithMetadata, or Discard methods must be called later to free resources.

func (*Model) SampleRate ¶

func (m *Model) SampleRate() int

SampleRate returns the sample rate that was used to produce the model file.

func (*Model) SetBeamWidth ¶

func (m *Model) SetBeamWidth(width uint) error

SetBeamWidth sets the beam width value used by the model. A larger beam width value generates better results at the cost of decoding time.

func (*Model) SetScorerAlphaBeta ¶

func (m *Model) SetScorerAlphaBeta(alpha, beta float32) error

SetScorerAlphaBeta sets hyperparameters alpha and beta of the external scorer. alpha is the language model weight. beta is the word insertion weight.

func (*Model) SpeechToText ¶

func (m *Model) SpeechToText(buffer []int16) (string, error)

SpeechToText uses the model to convert speech to text. buffer is 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

func (*Model) SpeechToTextWithMetadata ¶

func (m *Model) SpeechToTextWithMetadata(buffer []int16, numResults uint) (*Metadata, error)

SpeechToTextWithMetadata uses the model to convert speech to text and output results including metadata.

buffer is a 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on). numResults is the maximum number of CandidateTranscript structs to return. Returned value might be smaller than this. If an error is not returned, the returned metadata's Close method must be called later to free resources.

type Stream ¶

type Stream struct {
	// contains filtered or unexported fields
}

Stream represents a streaming inference state.

func (*Stream) Discard ¶

func (s *Stream) Discard()

Discard destroys a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don't want to perform a costly decode operation.

func (*Stream) FeedAudioContent ¶

func (s *Stream) FeedAudioContent(buffer []int16)

FeedAudioContent feeds audio samples to an ongoing streaming inference. buffer is an array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).

func (*Stream) Finish ¶

func (s *Stream) Finish() (string, error)

Finish computes the final decoding of an ongoing streaming inference and returns the result. This signals the end of an ongoing streaming inference.

func (*Stream) FinishWithMetadata ¶

func (s *Stream) FinishWithMetadata(numResults uint) (*Metadata, error)

FinishWithMetadata computes the final decoding of an ongoing streaming inference and returns results including metadata. This signals the end of an ongoing streaming inference. If an error is not returned, the metadata's Close method must be called.

func (*Stream) IntermediateDecode ¶

func (s *Stream) IntermediateDecode() (string, error)

IntermediateDecode computes the intermediate decoding of an ongoing streaming inference. This is an expensive process as the decoder implementation isn't currently capable of streaming, so it always starts from the beginning of the audio.

func (*Stream) IntermediateDecodeWithMetadata ¶

func (s *Stream) IntermediateDecodeWithMetadata(numResults uint) (*Metadata, error)

IntermediateDecodeWithMetadata computes the intermediate decoding of an ongoing streaming inference, returning results including metadata. numResults is the number of candidate transcripts to return. If an error is not returned, the metadata's Close method must be called.

type TokenMetadata ¶

type TokenMetadata C.struct_TokenMetadata

TokenMetadata stores text of an individual token, along with its timing information.

func (*TokenMetadata) StartTime ¶

func (tm *TokenMetadata) StartTime() float32

StartTime returns the position of the token in seconds.

func (*TokenMetadata) Text ¶

func (tm *TokenMetadata) Text() string

Text returns the text corresponding to this token.

func (*TokenMetadata) Timestep ¶

func (tm *TokenMetadata) Timestep() uint

Timestep returns the position of the token in units of 20ms.

Source Files ¶

View all Source files

coqui.go

Directories ¶

Path	Synopsis
cmd
asticoqui

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL