whisper

package module
v0.0.21 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 11, 2024 License: Apache-2.0 Imports: 13 Imported by: 0

README

go-whisper

Speech-to-Text in golang. This is an early development version.

  • cmd contains an OpenAI-API compatible service
  • pkg contains the whisper service and client
  • sys contains the whisper bindings to the whisper.cpp library
  • third_party is a submodule for the whisper.cpp source

Running

You can either run the whisper service as a CLI command or in a docker container. There are docker images for arm64 and amd64 (Intel). The arm64 image is built for Jetson GPU support specifically, but it will also run on Raspberry Pi's.

In order to utilize a NVIDIA GPU, you'll need to install the NVIDIA Container Toolkit first.

A docker volume should be created called "whisper" can be used for storing the Whisper language models. You can see which models are available to download locally here.

The following command will run the server on port 8080 for an NVIDIA GPU:

docker run \
  --name whisper-server --rm \
  --runtime nvidia --gpus all \ # When using a NVIDIA GPU
  -v whisper:/data -p 8080:80 \
  ghcr.io/mutablelogic/go-whisper:latest

The API is then available at http://localhost:8080/v1 and it generally conforms to the OpenAI API spec.

Sample Usage

In order to download a model, you can use the following command (for example):

curl -X POST -H "Content-Type: application/json" -d '{"Path" : "ggml-medium-q5_0.bin" }' localhost:8080/v1/models\?stream=true

To list the models available, you can use the following command:

curl -X GET localhost:8080/v1/models

To delete a model, you can use the following command:

curl -X DELETE localhost:8080/v1/models/ggml-medium-q5_0

To transcribe a media file into it's original language, you can use the following command:

curl -F model=ggml-medium-q5_0 -F file=@samples/jfk.wav localhost:8080/v1/audio/transcriptions\?stream=true

To translate a media file into a different language, you can use the following command:

curl -F model=ggml-medium-q5_0 -F file=@samples/de-podcast.wav -F language=en localhost:8080/v1/audio/translations\?stream=true

There's more information on the API here.

Building

If you are building a docker image, you just need make and docker installed:

  • DOCKER_REGISTRY=docker.io/user make docker - builds a docker container with the server binary, tagged to a specific registry

If you want to build the server yourself for your specific combination of hardware (for example, on MacOS), you can use the Makefile in the root directory and have the following dependencies met:

  • Go 1.22
  • C++ compiler
  • FFmpeg 6.1 libraries (see here for more information)
  • For CUDA, you'll need the CUDA toolkit installed including the nvcc compiler

The following Makefile targets can be used:

  • make server - creates the server binary, and places it in the build directory. Should link to Metal on macOS
  • GGML_CUDA=1 make server - creates the server binary linked to CUDA, and places it in the build directory. Should work for amd64 and arm64 (Jetson) platforms

See all the other targets in the Makefile for more information.

Developing

TODO

Status

Still in development. See this issue for remaining tasks to be completed.

Contributing & Distribution

This module is currently in development and subject to change.

Please do file feature requests and bugs here. The license is Apache 2 so feel free to redistribute. Redistributions in either source code or binary form must reproduce the copyright notice, and please link back to this repository for more information:

go-whisper
https://github.com/mutablelogic/go-whisper/
Copyright (c) 2023-2024 David Thorpe, All rights reserved.

whisper.cpp
https://github.com/ggerganov/whisper.cpp
Copyright (c) 2023-2024 The ggml authors

This software links to static libraries of whisper.cpp licensed under the MIT License.

Documentation

Index

Constants

View Source
const (

	// Sample Rate
	SampleRate = whisper.SampleRate
)

Variables

This section is empty.

Functions

This section is empty.

Types

type LogFn

type LogFn func(string)

type Opt

type Opt func(*opts) error

func OptDebug

func OptDebug() Opt

Set debugging

func OptLog

func OptLog(fn LogFn) Opt

Set logging function

func OptMaxConcurrent

func OptMaxConcurrent(v int) Opt

Set maximum number of concurrent tasks

func OptNoGPU

func OptNoGPU() Opt

Disable GPU acceleration

type Whisper

type Whisper struct {
	// contains filtered or unexported fields
}

Whisper represents a whisper service for running transcription and translation

func New

func New(path string, opt ...Opt) (*Whisper, error)

Create a new whisper service with the path to the models directory and optional parameters

func (*Whisper) Close

func (w *Whisper) Close() error

Release all resources

func (*Whisper) DeleteModelById

func (w *Whisper) DeleteModelById(id string) error

Delete a model by its id

func (*Whisper) DownloadModel

func (w *Whisper) DownloadModel(ctx context.Context, path string, fn func(curBytes, totalBytes uint64)) (*schema.Model, error)

Download a model by path, where the directory is the root of the model within the models directory. The model is returned immediately if it already exists in the store

func (*Whisper) GetModelById

func (w *Whisper) GetModelById(id string) *schema.Model

Get a model by its Id, returns nil if the model does not exist

func (*Whisper) ListModels

func (w *Whisper) ListModels() []*schema.Model

Return all models in the models directory

func (*Whisper) MarshalJSON

func (w *Whisper) MarshalJSON() ([]byte, error)

func (*Whisper) String

func (w *Whisper) String() string

func (*Whisper) WithModel

func (w *Whisper) WithModel(model *schema.Model, fn func(task *task.Context) error) error

Get a task for the specified model, which may load the model or return an existing one. The context can then be used to run the Transcribe function, and after the context is returned to the pool.

Directories

Path Synopsis
cmd
api
pkg
api
segmenter
segmenter package provides a segmenter for audio files and streams
segmenter package provides a segmenter for audio files and streams
store
store implements a model store which allows downloading models from a remote server
store implements a model store which allows downloading models from a remote server
sys

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL