whisper

package module

v0.0.0-...-ece3ff8 Latest Latest Go to latest Published: Jan 4, 2025 License: MIT Imports: 4 Imported by: 7

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ggerganov/whisper.cpp

README ¶

Go bindings for Whisper

This package provides Go bindings for whisper.cpp. They have been tested on:

Darwin (OS X) 12.6 on x64_64
Debian Linux on arm64
Fedora Linux on x86_64

The "low level" bindings are in the bindings/go directory and there is a more Go-style package in the bindings/go/pkg/whisper directory. The most simple usage is as follows:

import (
	"github.com/ggerganov/whisper.cpp/bindings/go/pkg/whisper"
)

func main() {
	var modelpath string // Path to the model
	var samples []float32 // Samples to process

	// Load the model
	model, err := whisper.New(modelpath)
	if err != nil {
		panic(err)
	}
	defer model.Close()

	// Process samples
	context, err := model.NewContext()
	if err != nil {
		panic(err)
	}
	if err := context.Process(samples, nil, nil); err != nil {
		return err
	}

	// Print out the results
	for {
		segment, err := context.NextSegment()
		if err != nil {
			break
		}
		fmt.Printf("[%6s->%6s] %s\n", segment.Start, segment.End, segment.Text)
	}
}

Building & Testing

In order to build, you need to have the Go compiler installed. You can get it from here. Run the tests with:

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp/bindings/go
make test

This will compile a static libwhisper.a in a build folder, download a model file, then run the tests. To build the examples:

make examples

To build using cuda support add GGML_CUDA=1:

GGML_CUDA=1 make examples

The examples are placed in the build directory. Once built, you can download all the models with the following command:

./build/go-model-download -out models

And you can then test a model against samples with the following command:

./build/go-whisper -model models/ggml-tiny.en.bin samples/jfk.wav

Using the bindings

To use the bindings in your own software,

Import github.com/ggerganov/whisper.cpp/bindings/go/pkg/whisper (or github.com/ggerganov/whisper.cpp/bindings/go into your package;
Compile libwhisper.a (you can use make whisper in the bindings/go directory);
Link your go binary against whisper by setting the environment variables C_INCLUDE_PATH and LIBRARY_PATH to point to the whisper.h file directory and libwhisper.a file directory respectively.

Look at the Makefile in the bindings/go directory for an example.

The API Documentation:

Getting help:

Follow the discussion for the go bindings here

License

The license for the Go bindings is the same as the license for the rest of the whisper.cpp project, which is the MIT License. See the LICENSE file for more details.

Documentation ¶

Overview ¶

github.com/ggerganov/whisper.cpp/bindings/go provides a speech-to-text service bindings for the Go programming language.

Index ¶

Constants
Variables
func Whisper_lang_max_id() int
func Whisper_lang_str(id int) string
func Whisper_print_system_info() string
type Context
- func Whisper_init(path string) *Context
type Params
type SamplingStrategy
type Token
type TokenData

Constants ¶

View Source

const (
	SampleRate = C.WHISPER_SAMPLE_RATE                 // Expected sample rate, samples per second
	SampleBits = uint16(unsafe.Sizeof(C.float(0))) * 8 // Sample size in bits
	NumFFT     = C.WHISPER_N_FFT
	HopLength  = C.WHISPER_HOP_LENGTH
	ChunkSize  = C.WHISPER_CHUNK_SIZE
)

Variables ¶

View Source

var (
	ErrTokenizerFailed  = errors.New("whisper_tokenize failed")
	ErrAutoDetectFailed = errors.New("whisper_lang_auto_detect failed")
	ErrConversionFailed = errors.New("whisper_convert failed")
	ErrInvalidLanguage  = errors.New("invalid language")
)

Functions ¶

func Whisper_lang_max_id ¶

func Whisper_lang_max_id() int

Largest language id (i.e. number of available languages - 1)

func Whisper_lang_str ¶

func Whisper_lang_str(id int) string

Return the short string of the specified language id (e.g. 2 -> "de"), returns empty string if not found

func Whisper_print_system_info ¶

func Whisper_print_system_info() string

Print system information

Types ¶

type Context ¶

type Context C.struct_whisper_context

func Whisper_init ¶

func Whisper_init(path string) *Context

Allocates all memory needed for the model and loads the model from the given file. Returns NULL on failure.

func (*Context) Whisper_decode ¶

func (ctx *Context) Whisper_decode(tokens []Token, past, threads int) error

Run the Whisper decoder to obtain the logits and probabilities for the next token. Make sure to call whisper_encode() first. tokens + n_tokens is the provided context for the decoder. n_past is the number of tokens to use from previous decoder calls.

func (*Context) Whisper_encode ¶

func (ctx *Context) Whisper_encode(offset, threads int) error

Run the Whisper encoder on the log mel spectrogram stored inside the provided whisper context. Make sure to call whisper_pcm_to_mel() or whisper_set_mel() first. offset can be used to specify the offset of the first frame in the spectrogram.

func (*Context) Whisper_free ¶

func (ctx *Context) Whisper_free()

Frees all memory allocated by the model.

func (*Context) Whisper_full ¶

func (ctx *Context) Whisper_full(
	params Params,
	samples []float32,
	encoderBeginCallback func() bool,
	newSegmentCallback func(int),
	progressCallback func(int),
) error

Run the entire model: PCM -> log mel spectrogram -> encoder -> decoder -> text Uses the specified decoding strategy to obtain the text.

func (*Context) Whisper_full_default_params ¶

func (ctx *Context) Whisper_full_default_params(strategy SamplingStrategy) Params

Return default parameters for a strategy

func (*Context) Whisper_full_get_segment_t0 ¶

func (ctx *Context) Whisper_full_get_segment_t0(segment int) int64

Get the start and end time of the specified segment.

func (*Context) Whisper_full_get_segment_t1 ¶

func (ctx *Context) Whisper_full_get_segment_t1(segment int) int64

Get the start and end time of the specified segment.

func (*Context) Whisper_full_get_segment_text ¶

func (ctx *Context) Whisper_full_get_segment_text(segment int) string

Get the text of the specified segment.

func (*Context) Whisper_full_get_token_data ¶

func (ctx *Context) Whisper_full_get_token_data(segment int, token int) TokenData

Get token data for the specified token in the specified segment. This contains probabilities, timestamps, etc.

func (*Context) Whisper_full_get_token_id ¶

func (ctx *Context) Whisper_full_get_token_id(segment int, token int) Token

Get the token of the specified token index in the specified segment.

func (*Context) Whisper_full_get_token_p ¶

func (ctx *Context) Whisper_full_get_token_p(segment int, token int) float32

Get the probability of the specified token in the specified segment.

func (*Context) Whisper_full_get_token_text ¶

func (ctx *Context) Whisper_full_get_token_text(segment int, token int) string

Get the token text of the specified token index in the specified segment.

func (*Context) Whisper_full_lang_id ¶

func (ctx *Context) Whisper_full_lang_id() int

Return the id of the autodetected language, returns -1 if not found Added to whisper.cpp in https://github.com/ggerganov/whisper.cpp/commit/a1c1583cc7cd8b75222857afc936f0638c5683d6

Examples:

"de" -> 2
"german" -> 2

func (*Context) Whisper_full_n_segments ¶

func (ctx *Context) Whisper_full_n_segments() int

Number of generated text segments. A segment can be a few words, a sentence, or even a paragraph.

func (*Context) Whisper_full_n_tokens ¶

func (ctx *Context) Whisper_full_n_tokens(segment int) int

Get number of tokens in the specified segment.

func (*Context) Whisper_full_parallel ¶

func (ctx *Context) Whisper_full_parallel(params Params, samples []float32, processors int, encoderBeginCallback func() bool, newSegmentCallback func(int)) error

Split the input audio in chunks and process each chunk separately using whisper_full() It seems this approach can offer some speedup in some cases. However, the transcription accuracy can be worse at the beginning and end of each chunk.

func (*Context) Whisper_is_multilingual ¶

func (ctx *Context) Whisper_is_multilingual() int

func (*Context) Whisper_lang_auto_detect ¶

func (ctx *Context) Whisper_lang_auto_detect(offset_ms, n_threads int) ([]float32, error)

Use mel data at offset_ms to try and auto-detect the spoken language Make sure to call whisper_pcm_to_mel() or whisper_set_mel() first. Returns the probabilities of all languages. ref: https://github.com/openai/whisper/blob/main/whisper/decoding.py#L18-L69

func (*Context) Whisper_lang_id ¶

func (ctx *Context) Whisper_lang_id(lang string) int

Return the id of the specified language, returns -1 if not found Examples:

"de" -> 2
"german" -> 2

func (*Context) Whisper_n_audio_ctx ¶

func (ctx *Context) Whisper_n_audio_ctx() int

func (*Context) Whisper_n_len ¶

func (ctx *Context) Whisper_n_len() int

func (*Context) Whisper_n_text_ctx ¶

func (ctx *Context) Whisper_n_text_ctx() int

func (*Context) Whisper_n_vocab ¶

func (ctx *Context) Whisper_n_vocab() int

func (*Context) Whisper_pcm_to_mel ¶

func (ctx *Context) Whisper_pcm_to_mel(data []float32, threads int) error

Convert RAW PCM audio to log mel spectrogram. The resulting spectrogram is stored inside the provided whisper context.

func (*Context) Whisper_print_timings ¶

func (ctx *Context) Whisper_print_timings()

Performance information

func (*Context) Whisper_reset_timings ¶

func (ctx *Context) Whisper_reset_timings()

Performance information

func (*Context) Whisper_set_mel ¶

func (ctx *Context) Whisper_set_mel(data []float32, n_mel int) error

This can be used to set a custom log mel spectrogram inside the provided whisper context. Use this instead of whisper_pcm_to_mel() if you want to provide your own log mel spectrogram. n_mel must be 80

func (*Context) Whisper_token_beg ¶

func (ctx *Context) Whisper_token_beg() Token

Special tokens

func (*Context) Whisper_token_eot ¶

func (ctx *Context) Whisper_token_eot() Token

Special tokens

func (*Context) Whisper_token_lang ¶

func (ctx *Context) Whisper_token_lang(lang_id int) Token

Special tokens

func (*Context) Whisper_token_not ¶

func (ctx *Context) Whisper_token_not() Token

Special tokens

func (*Context) Whisper_token_prev ¶

func (ctx *Context) Whisper_token_prev() Token

Special tokens

func (*Context) Whisper_token_solm ¶

func (ctx *Context) Whisper_token_solm() Token

Special tokens

func (*Context) Whisper_token_sot ¶

func (ctx *Context) Whisper_token_sot() Token

Special tokens

func (*Context) Whisper_token_to_str ¶

func (ctx *Context) Whisper_token_to_str(token Token) string

Token Id -> String. Uses the vocabulary in the provided context

func (*Context) Whisper_token_transcribe ¶

func (ctx *Context) Whisper_token_transcribe() Token

Task tokens

func (*Context) Whisper_token_translate ¶

func (ctx *Context) Whisper_token_translate() Token

Task tokens

func (*Context) Whisper_tokenize ¶

func (ctx *Context) Whisper_tokenize(text string, tokens []Token) (int, error)

Convert the provided text into tokens. The tokens pointer must be large enough to hold the resulting tokens. Returns the number of tokens on success

type Params ¶

type Params C.struct_whisper_full_params

func (*Params) Language ¶

func (p *Params) Language() int

Get language id

func (*Params) SetAudioCtx ¶

func (p *Params) SetAudioCtx(n int)

Set audio encoder context

func (*Params) SetBeamSize ¶

func (p *Params) SetBeamSize(n int)

func (*Params) SetDuration ¶

func (p *Params) SetDuration(duration_ms int)

Set audio duration to process in ms

func (*Params) SetEntropyThold ¶

func (p *Params) SetEntropyThold(t float32)

func (*Params) SetInitialPrompt ¶

func (p *Params) SetInitialPrompt(prompt string)

Set initial prompt

func (*Params) SetLanguage ¶

func (p *Params) SetLanguage(lang int) error

Set language id

func (*Params) SetMaxContext ¶

func (p *Params) SetMaxContext(n int)

func (*Params) SetMaxSegmentLength ¶

func (p *Params) SetMaxSegmentLength(n int)

Set max segment length in characters

func (*Params) SetMaxTokensPerSegment ¶

func (p *Params) SetMaxTokensPerSegment(n int)

Set max tokens per segment (0 = no limit)

func (*Params) SetNoContext ¶

func (p *Params) SetNoContext(v bool)

func (*Params) SetOffset ¶

func (p *Params) SetOffset(offset_ms int)

Set start offset in ms

func (*Params) SetPrintProgress ¶

func (p *Params) SetPrintProgress(v bool)

func (*Params) SetPrintRealtime ¶

func (p *Params) SetPrintRealtime(v bool)

func (*Params) SetPrintSpecial ¶

func (p *Params) SetPrintSpecial(v bool)

func (*Params) SetPrintTimestamps ¶

func (p *Params) SetPrintTimestamps(v bool)

func (*Params) SetSingleSegment ¶

func (p *Params) SetSingleSegment(v bool)

func (*Params) SetSplitOnWord ¶

func (p *Params) SetSplitOnWord(v bool)

func (*Params) SetTemperature ¶

func (p *Params) SetTemperature(t float32)

func (*Params) SetTemperatureFallback ¶

func (p *Params) SetTemperatureFallback(t float32)

Sets the fallback temperature incrementation Pass -1.0 to disable this feature

func (*Params) SetThreads ¶

func (p *Params) SetThreads(threads int)

Set number of threads to use

func (*Params) SetTokenSumThreshold ¶

func (p *Params) SetTokenSumThreshold(t float32)

Set timestamp token sum probability threshold (~0.01)

func (*Params) SetTokenThreshold ¶

func (p *Params) SetTokenThreshold(t float32)

Set timestamp token probability threshold (~0.01)

func (*Params) SetTokenTimestamps ¶

func (p *Params) SetTokenTimestamps(b bool)

func (*Params) SetTranslate ¶

func (p *Params) SetTranslate(v bool)

func (*Params) String ¶

func (p *Params) String() string

func (*Params) Threads ¶

func (p *Params) Threads() int

Threads available

type SamplingStrategy ¶

type SamplingStrategy C.enum_whisper_sampling_strategy

const (
	SAMPLING_GREEDY      SamplingStrategy = C.WHISPER_SAMPLING_GREEDY
	SAMPLING_BEAM_SEARCH SamplingStrategy = C.WHISPER_SAMPLING_BEAM_SEARCH
)

type Token ¶

type Token C.whisper_token

type TokenData ¶

type TokenData C.struct_whisper_token_data

func (TokenData) Id ¶

func (t TokenData) Id() Token

func (TokenData) T0 ¶

func (t TokenData) T0() int64

func (TokenData) T1 ¶

func (t TokenData) T1() int64

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
examples
go-model-download
go-whisper
pkg
whisper This is the higher-level speech-to-text whisper.cpp API for go	This is the higher-level speech-to-text whisper.cpp API for go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL