whisper

package module
v0.0.0-...-ece3ff8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 4, 2025 License: MIT Imports: 4 Imported by: 7

README

Go bindings for Whisper

This package provides Go bindings for whisper.cpp. They have been tested on:

  • Darwin (OS X) 12.6 on x64_64
  • Debian Linux on arm64
  • Fedora Linux on x86_64

The "low level" bindings are in the bindings/go directory and there is a more Go-style package in the bindings/go/pkg/whisper directory. The most simple usage is as follows:

import (
	"github.com/ggerganov/whisper.cpp/bindings/go/pkg/whisper"
)

func main() {
	var modelpath string // Path to the model
	var samples []float32 // Samples to process

	// Load the model
	model, err := whisper.New(modelpath)
	if err != nil {
		panic(err)
	}
	defer model.Close()

	// Process samples
	context, err := model.NewContext()
	if err != nil {
		panic(err)
	}
	if err := context.Process(samples, nil, nil); err != nil {
		return err
	}

	// Print out the results
	for {
		segment, err := context.NextSegment()
		if err != nil {
			break
		}
		fmt.Printf("[%6s->%6s] %s\n", segment.Start, segment.End, segment.Text)
	}
}

Building & Testing

In order to build, you need to have the Go compiler installed. You can get it from here. Run the tests with:

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp/bindings/go
make test

This will compile a static libwhisper.a in a build folder, download a model file, then run the tests. To build the examples:

make examples

To build using cuda support add GGML_CUDA=1:

GGML_CUDA=1 make examples

The examples are placed in the build directory. Once built, you can download all the models with the following command:

./build/go-model-download -out models

And you can then test a model against samples with the following command:

./build/go-whisper -model models/ggml-tiny.en.bin samples/jfk.wav

Using the bindings

To use the bindings in your own software,

  1. Import github.com/ggerganov/whisper.cpp/bindings/go/pkg/whisper (or github.com/ggerganov/whisper.cpp/bindings/go into your package;
  2. Compile libwhisper.a (you can use make whisper in the bindings/go directory);
  3. Link your go binary against whisper by setting the environment variables C_INCLUDE_PATH and LIBRARY_PATH to point to the whisper.h file directory and libwhisper.a file directory respectively.

Look at the Makefile in the bindings/go directory for an example.

The API Documentation:

Getting help:

  • Follow the discussion for the go bindings here

License

The license for the Go bindings is the same as the license for the rest of the whisper.cpp project, which is the MIT License. See the LICENSE file for more details.

Documentation

Overview

github.com/ggerganov/whisper.cpp/bindings/go provides a speech-to-text service bindings for the Go programming language.

Index

Constants

View Source
const (
	SampleRate = C.WHISPER_SAMPLE_RATE                 // Expected sample rate, samples per second
	SampleBits = uint16(unsafe.Sizeof(C.float(0))) * 8 // Sample size in bits
	NumFFT     = C.WHISPER_N_FFT
	HopLength  = C.WHISPER_HOP_LENGTH
	ChunkSize  = C.WHISPER_CHUNK_SIZE
)

Variables

View Source
var (
	ErrTokenizerFailed  = errors.New("whisper_tokenize failed")
	ErrAutoDetectFailed = errors.New("whisper_lang_auto_detect failed")
	ErrConversionFailed = errors.New("whisper_convert failed")
	ErrInvalidLanguage  = errors.New("invalid language")
)

Functions

func Whisper_lang_max_id

func Whisper_lang_max_id() int

Largest language id (i.e. number of available languages - 1)

func Whisper_lang_str

func Whisper_lang_str(id int) string

Return the short string of the specified language id (e.g. 2 -> "de"), returns empty string if not found

func Whisper_print_system_info

func Whisper_print_system_info() string

Print system information

Types

type Context

type Context C.struct_whisper_context

func Whisper_init

func Whisper_init(path string) *Context

Allocates all memory needed for the model and loads the model from the given file. Returns NULL on failure.

func (*Context) Whisper_decode

func (ctx *Context) Whisper_decode(tokens []Token, past, threads int) error

Run the Whisper decoder to obtain the logits and probabilities for the next token. Make sure to call whisper_encode() first. tokens + n_tokens is the provided context for the decoder. n_past is the number of tokens to use from previous decoder calls.

func (*Context) Whisper_encode

func (ctx *Context) Whisper_encode(offset, threads int) error

Run the Whisper encoder on the log mel spectrogram stored inside the provided whisper context. Make sure to call whisper_pcm_to_mel() or whisper_set_mel() first. offset can be used to specify the offset of the first frame in the spectrogram.

func (*Context) Whisper_free

func (ctx *Context) Whisper_free()

Frees all memory allocated by the model.

func (*Context) Whisper_full

func (ctx *Context) Whisper_full(
	params Params,
	samples []float32,
	encoderBeginCallback func() bool,
	newSegmentCallback func(int),
	progressCallback func(int),
) error

Run the entire model: PCM -> log mel spectrogram -> encoder -> decoder -> text Uses the specified decoding strategy to obtain the text.

func (*Context) Whisper_full_default_params

func (ctx *Context) Whisper_full_default_params(strategy SamplingStrategy) Params

Return default parameters for a strategy

func (*Context) Whisper_full_get_segment_t0

func (ctx *Context) Whisper_full_get_segment_t0(segment int) int64

Get the start and end time of the specified segment.

func (*Context) Whisper_full_get_segment_t1

func (ctx *Context) Whisper_full_get_segment_t1(segment int) int64

Get the start and end time of the specified segment.

func (*Context) Whisper_full_get_segment_text

func (ctx *Context) Whisper_full_get_segment_text(segment int) string

Get the text of the specified segment.

func (*Context) Whisper_full_get_token_data

func (ctx *Context) Whisper_full_get_token_data(segment int, token int) TokenData

Get token data for the specified token in the specified segment. This contains probabilities, timestamps, etc.

func (*Context) Whisper_full_get_token_id

func (ctx *Context) Whisper_full_get_token_id(segment int, token int) Token

Get the token of the specified token index in the specified segment.

func (*Context) Whisper_full_get_token_p

func (ctx *Context) Whisper_full_get_token_p(segment int, token int) float32

Get the probability of the specified token in the specified segment.

func (*Context) Whisper_full_get_token_text

func (ctx *Context) Whisper_full_get_token_text(segment int, token int) string

Get the token text of the specified token index in the specified segment.

func (*Context) Whisper_full_lang_id

func (ctx *Context) Whisper_full_lang_id() int

Return the id of the autodetected language, returns -1 if not found Added to whisper.cpp in https://github.com/ggerganov/whisper.cpp/commit/a1c1583cc7cd8b75222857afc936f0638c5683d6

Examples:

"de" -> 2
"german" -> 2

func (*Context) Whisper_full_n_segments

func (ctx *Context) Whisper_full_n_segments() int

Number of generated text segments. A segment can be a few words, a sentence, or even a paragraph.

func (*Context) Whisper_full_n_tokens

func (ctx *Context) Whisper_full_n_tokens(segment int) int

Get number of tokens in the specified segment.

func (*Context) Whisper_full_parallel

func (ctx *Context) Whisper_full_parallel(params Params, samples []float32, processors int, encoderBeginCallback func() bool, newSegmentCallback func(int)) error

Split the input audio in chunks and process each chunk separately using whisper_full() It seems this approach can offer some speedup in some cases. However, the transcription accuracy can be worse at the beginning and end of each chunk.

func (*Context) Whisper_is_multilingual

func (ctx *Context) Whisper_is_multilingual() int

func (*Context) Whisper_lang_auto_detect

func (ctx *Context) Whisper_lang_auto_detect(offset_ms, n_threads int) ([]float32, error)

Use mel data at offset_ms to try and auto-detect the spoken language Make sure to call whisper_pcm_to_mel() or whisper_set_mel() first. Returns the probabilities of all languages. ref: https://github.com/openai/whisper/blob/main/whisper/decoding.py#L18-L69

func (*Context) Whisper_lang_id

func (ctx *Context) Whisper_lang_id(lang string) int

Return the id of the specified language, returns -1 if not found Examples:

"de" -> 2
"german" -> 2

func (*Context) Whisper_n_audio_ctx

func (ctx *Context) Whisper_n_audio_ctx() int

func (*Context) Whisper_n_len

func (ctx *Context) Whisper_n_len() int

func (*Context) Whisper_n_text_ctx

func (ctx *Context) Whisper_n_text_ctx() int

func (*Context) Whisper_n_vocab

func (ctx *Context) Whisper_n_vocab() int

func (*Context) Whisper_pcm_to_mel

func (ctx *Context) Whisper_pcm_to_mel(data []float32, threads int) error

Convert RAW PCM audio to log mel spectrogram. The resulting spectrogram is stored inside the provided whisper context.

func (*Context) Whisper_print_timings

func (ctx *Context) Whisper_print_timings()

Performance information

func (*Context) Whisper_reset_timings

func (ctx *Context) Whisper_reset_timings()

Performance information

func (*Context) Whisper_set_mel

func (ctx *Context) Whisper_set_mel(data []float32, n_mel int) error

This can be used to set a custom log mel spectrogram inside the provided whisper context. Use this instead of whisper_pcm_to_mel() if you want to provide your own log mel spectrogram. n_mel must be 80

func (*Context) Whisper_token_beg

func (ctx *Context) Whisper_token_beg() Token

Special tokens

func (*Context) Whisper_token_eot

func (ctx *Context) Whisper_token_eot() Token

Special tokens

func (*Context) Whisper_token_lang

func (ctx *Context) Whisper_token_lang(lang_id int) Token

Special tokens

func (*Context) Whisper_token_not

func (ctx *Context) Whisper_token_not() Token

Special tokens

func (*Context) Whisper_token_prev

func (ctx *Context) Whisper_token_prev() Token

Special tokens

func (*Context) Whisper_token_solm

func (ctx *Context) Whisper_token_solm() Token

Special tokens

func (*Context) Whisper_token_sot

func (ctx *Context) Whisper_token_sot() Token

Special tokens

func (*Context) Whisper_token_to_str

func (ctx *Context) Whisper_token_to_str(token Token) string

Token Id -> String. Uses the vocabulary in the provided context

func (*Context) Whisper_token_transcribe

func (ctx *Context) Whisper_token_transcribe() Token

Task tokens

func (*Context) Whisper_token_translate

func (ctx *Context) Whisper_token_translate() Token

Task tokens

func (*Context) Whisper_tokenize

func (ctx *Context) Whisper_tokenize(text string, tokens []Token) (int, error)

Convert the provided text into tokens. The tokens pointer must be large enough to hold the resulting tokens. Returns the number of tokens on success

type Params

func (*Params) Language

func (p *Params) Language() int

Get language id

func (*Params) SetAudioCtx

func (p *Params) SetAudioCtx(n int)

Set audio encoder context

func (*Params) SetBeamSize

func (p *Params) SetBeamSize(n int)

func (*Params) SetDuration

func (p *Params) SetDuration(duration_ms int)

Set audio duration to process in ms

func (*Params) SetEntropyThold

func (p *Params) SetEntropyThold(t float32)

func (*Params) SetInitialPrompt

func (p *Params) SetInitialPrompt(prompt string)

Set initial prompt

func (*Params) SetLanguage

func (p *Params) SetLanguage(lang int) error

Set language id

func (*Params) SetMaxContext

func (p *Params) SetMaxContext(n int)

func (*Params) SetMaxSegmentLength

func (p *Params) SetMaxSegmentLength(n int)

Set max segment length in characters

func (*Params) SetMaxTokensPerSegment

func (p *Params) SetMaxTokensPerSegment(n int)

Set max tokens per segment (0 = no limit)

func (*Params) SetNoContext

func (p *Params) SetNoContext(v bool)

func (*Params) SetOffset

func (p *Params) SetOffset(offset_ms int)

Set start offset in ms

func (*Params) SetPrintProgress

func (p *Params) SetPrintProgress(v bool)

func (*Params) SetPrintRealtime

func (p *Params) SetPrintRealtime(v bool)

func (*Params) SetPrintSpecial

func (p *Params) SetPrintSpecial(v bool)

func (*Params) SetPrintTimestamps

func (p *Params) SetPrintTimestamps(v bool)

func (*Params) SetSingleSegment

func (p *Params) SetSingleSegment(v bool)

func (*Params) SetSplitOnWord

func (p *Params) SetSplitOnWord(v bool)

func (*Params) SetTemperature

func (p *Params) SetTemperature(t float32)

func (*Params) SetTemperatureFallback

func (p *Params) SetTemperatureFallback(t float32)

Sets the fallback temperature incrementation Pass -1.0 to disable this feature

func (*Params) SetThreads

func (p *Params) SetThreads(threads int)

Set number of threads to use

func (*Params) SetTokenSumThreshold

func (p *Params) SetTokenSumThreshold(t float32)

Set timestamp token sum probability threshold (~0.01)

func (*Params) SetTokenThreshold

func (p *Params) SetTokenThreshold(t float32)

Set timestamp token probability threshold (~0.01)

func (*Params) SetTokenTimestamps

func (p *Params) SetTokenTimestamps(b bool)

func (*Params) SetTranslate

func (p *Params) SetTranslate(v bool)

func (*Params) String

func (p *Params) String() string

func (*Params) Threads

func (p *Params) Threads() int

Threads available

type Token

type Token C.whisper_token

type TokenData

type TokenData C.struct_whisper_token_data

func (TokenData) Id

func (t TokenData) Id() Token

func (TokenData) T0

func (t TokenData) T0() int64

func (TokenData) T1

func (t TokenData) T1() int64

Directories

Path Synopsis
examples
pkg
whisper
This is the higher-level speech-to-text whisper.cpp API for go
This is the higher-level speech-to-text whisper.cpp API for go

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL