Documentation ¶
Overview ¶
This is the higher-level speech-to-text whisper.cpp API for go
Index ¶
Constants ¶
View Source
const SampleBits = whisper.SampleBits
SampleBits is the number of bytes per sample.
View Source
const SampleRate = whisper.SampleRate
SampleRate is the sample rate of the audio data.
Variables ¶
View Source
var ( ErrUnableToLoadModel = errors.New("unable to load model") ErrInternalAppError = errors.New("internal application error") ErrProcessingFailed = errors.New("processing failed") ErrUnsupportedLanguage = errors.New("unsupported language") ErrModelNotMultilingual = errors.New("model is not multilingual") )
Functions ¶
This section is empty.
Types ¶
type Context ¶
type Context interface { SetLanguage(string) error // Set the language to use for speech recognition, use "auto" for auto detect language. SetTranslate(bool) // Set translate flag IsMultilingual() bool // Return true if the model is multilingual. Language() string // Get language SetOffset(time.Duration) // Set offset SetDuration(time.Duration) // Set duration SetThreads(uint) // Set number of threads to use SetSplitOnWord(bool) // Set split on word flag SetTokenThreshold(float32) // Set timestamp token probability threshold SetTokenSumThreshold(float32) // Set timestamp token sum probability threshold SetMaxSegmentLength(uint) // Set max segment length in characters SetTokenTimestamps(bool) // Set token timestamps flag SetMaxTokensPerSegment(uint) // Set max tokens per segment (0 = no limit) SetAudioCtx(uint) // Set audio encoder context SetMaxContext(n int) // Set maximum number of text context tokens to store SetBeamSize(n int) // Set Beam Size SetEntropyThold(t float32) // Set Entropy threshold SetInitialPrompt(prompt string) // Set initial prompt SetTemperature(t float32) // Set temperature SetTemperatureFallback(t float32) // Set temperature incrementation // Process mono audio data and return any errors. // If defined, newly generated segments are passed to the // callback function during processing. Process([]float32, SegmentCallback, ProgressCallback) error // After process is called, return segments until the end of the stream // is reached, when io.EOF is returned. NextSegment() (Segment, error) IsBEG(Token) bool // Test for "begin" token IsSOT(Token) bool // Test for "start of transcription" token IsEOT(Token) bool // Test for "end of transcription" token IsPREV(Token) bool // Test for "start of prev" token IsSOLM(Token) bool // Test for "start of lm" token IsNOT(Token) bool // Test for "No timestamps" token IsLANG(Token, string) bool // Test for token associated with a specific language IsText(Token) bool // Test for text token // Timings PrintTimings() ResetTimings() SystemInfo() string }
Context is the speach recognition context.
type Model ¶
type Model interface { io.Closer // Return a new speech-to-text context. NewContext() (Context, error) // Return true if the model is multilingual. IsMultilingual() bool // Return all languages supported. Languages() []string }
Model is the interface to a whisper model. Create a new model with the function whisper.New(string)
type ProgressCallback ¶
type ProgressCallback func(int)
ProgressCallback is the callback function for reporting progress during processing. It is called during the Process function
type Segment ¶
type Segment struct { // Segment Number Num int // Time beginning and end timestamps for the segment. Start, End time.Duration // The text of the segment. Text string // The tokens of the segment. Tokens []Token }
Segment is the text result of a speech recognition.
type SegmentCallback ¶
type SegmentCallback func(Segment)
SegmentCallback is the callback function for processing segments in real time. It is called during the Process function
Click to show internal directories.
Click to hide internal directories.