leopard

package module
v2.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 13, 2024 License: Apache-2.0 Imports: 14 Imported by: 2

README

Leopard Binding for Go

Leopard Speech-to-Text Engine

Made in Vancouver, Canada by Picovoice

Leopard is an on-device speech-to-text engine. Leopard is:

  • Private; All voice processing runs locally.
  • Accurate
  • Compact and Computationally-Efficient
  • Cross-Platform:
    • Linux (x86_64), macOS (x86_64, arm64), and Windows (x86_64)
    • Android and iOS
    • Chrome, Safari, Firefox, and Edge
    • Raspberry Pi (3, 4, 5)

Compatibility

  • go 1.16+
  • Runs on Linux (x86_64), macOS (x86_64, arm64), Windows (x86_64), and Raspberry Pi (3, 4, 5).
  • Windows: The Go binding requires cgo, which means that you need to install a gcc compiler like Mingw to build it properly.
    • Go versions less than 1.20 requires gcc version 11 or lower.

Installation

go get github.com/Picovoice/leopard/binding/go/v2

AccessKey

Leopard requires a valid Picovoice AccessKey at initialization. AccessKey acts as your credentials when using Leopard SDKs. You can get your AccessKey for free. Make sure to keep your AccessKey secret. Signup or Login to Picovoice Console to get your AccessKey.

Usage

Create an instance of the engine and transcribe an audio file:

import . "github.com/Picovoice/leopard/binding/go/v2"

leopard := NewLeopard("${ACCESS_KEY}")
err := leopard.Init()
if err != nil {
    // handle err init
}
defer leopard.Delete()

transcript, words, err := leopard.ProcessFile("${AUDIO_FILE_PATH}")
if err != nil {
    // handle process error
}

print(transcript)

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console and ${AUDIO_FILE_PATH} to the path an audio file. Finally, when done be sure to explicitly release the resources using leopard.Delete().

Language Model

The Leopard Go SDK comes preloaded with a default English language model (.pv file). Default models for other supported languages can be found in lib/common.

Create custom language models using the Picovoice Console. Here you can train language models with custom vocabulary and boost words in the existing vocabulary.

Pass in the .pv file by setting .ModelPath on an instance of Leopard before initializing:

leopard := NewLeopard("${ACCESS_KEY}")
leopard.ModelPath = "${MODEL_FILE_PATH}"
err := leopard.Init()

Word Metadata

Along with the transcript, Leopard returns metadata for each transcribed word. Available metadata items are:

  • Start Time: Indicates when the word started in the transcribed audio. Value is in seconds.
  • End Time: Indicates when the word ended in the transcribed audio. Value is in seconds.
  • Confidence: Leopard's confidence that the transcribed word is accurate. It is a number within [0, 1].
  • Speaker Tag: If speaker diarization is enabled on initialization, the speaker tag is a non-negative integer identifying unique speakers, with 0 reserved for unknown speakers. If speaker diarization is not enabled, the value will always be -1.

Demos

Check out the Leopard Go demos here.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// SampleRate Audio sample rate accepted by Picovoice.
	SampleRate int

	// Version Leopard version
	Version string
)

Functions

This section is empty.

Types

type Leopard

type Leopard struct {

	// AccessKey obtained from Picovoice Console (https://console.picovoice.ai/).
	AccessKey string

	// Absolute path to the file containing model parameters.
	ModelPath string

	// Absolute path to the Leopard's dynamic library.
	LibraryPath string

	// Flag to enable automatic punctuation insertion.
	EnableAutomaticPunctuation bool

	// Flag to enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process.
	// Word metadata will include a `SpeakerTag` to identify unique speakers.
	EnableDiarization bool
	// contains filtered or unexported fields
}

Leopard struct

func NewLeopard

func NewLeopard(accessKey string) Leopard

NewLeopard returns a Leopard struct with default parameters

func (*Leopard) Delete

func (leopard *Leopard) Delete() error

Delete releases resources acquired by Leopard.

func (*Leopard) Init

func (leopard *Leopard) Init() error

Init function for Leopard. Must be called before attempting process

func (*Leopard) Process

func (leopard *Leopard) Process(pcm []int16) (string, []LeopardWord, error)

Processes a given audio data and returns its transcription. The audio needs to have a sample rate equal to `.SampleRate` and be 16-bit linearly-encoded. This function operates on single-channel audio. If you wish to process data in a different sample rate or format consider using `ProcessFile`. Returns the inferred transcription.

func (*Leopard) ProcessFile

func (leopard *Leopard) ProcessFile(audioPath string) (string, []LeopardWord, error)

ProcessFile Processes a given audio file and returns its transcription. The supported formats are: `3gp (AMR)`, `FLAC`, `MP3`, `MP4/m4a (AAC)`, `Ogg`, `WAV`, `WebM`. Returns the inferred transcription.

type LeopardError

type LeopardError struct {
	StatusCode   PvStatus
	Message      string
	MessageStack []string
}

func (*LeopardError) Error

func (e *LeopardError) Error() string

type LeopardWord

type LeopardWord struct {
	// Transcribed word.
	Word string

	// Start of word in seconds.
	StartSec float32

	// End of word in seconds.
	EndSec float32

	// Transcription confidence. It is a number within [0, 1].
	Confidence float32

	// Unique speaker identifier. It is `-1` if diarization is not enabled during initialization; otherwise,
	// it's a non-negative integer identifying unique speakers, with `0` reserved for unknown speakers.
	SpeakerTag int32
}

type PvStatus

type PvStatus int

PvStatus type

const (
	SUCCESS                  PvStatus = 0
	OUT_OF_MEMORY            PvStatus = 1
	IO_ERROR                 PvStatus = 2
	INVALID_ARGUMENT         PvStatus = 3
	STOP_ITERATION           PvStatus = 4
	KEY_ERROR                PvStatus = 5
	INVALID_STATE            PvStatus = 6
	RUNTIME_ERROR            PvStatus = 7
	ACTIVATION_ERROR         PvStatus = 8
	ACTIVATION_LIMIT_REACHED PvStatus = 9
	ACTIVATION_THROTTLED     PvStatus = 10
	ACTIVATION_REFUSED       PvStatus = 11
)

Possible status return codes from the Leopard library

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL