leopard

package module

v2.0.3 Latest Latest Go to latest Published: Sep 13, 2024 License: Apache-2.0 Imports: 14 Imported by: 3

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Picovoice/leopard

Links

Open Source Insights

README ¶

Leopard Binding for Go

Leopard Speech-to-Text Engine

Made in Vancouver, Canada by Picovoice

Leopard is an on-device speech-to-text engine. Leopard is:

Private; All voice processing runs locally.
Accurate
Compact and Computationally-Efficient
Cross-Platform:
- Linux (x86_64), macOS (x86_64, arm64), and Windows (x86_64)
- Android and iOS
- Chrome, Safari, Firefox, and Edge
- Raspberry Pi (3, 4, 5)

Compatibility

go 1.16+
Runs on Linux (x86_64), macOS (x86_64, arm64), Windows (x86_64), and Raspberry Pi (3, 4, 5).
Windows: The Go binding requires cgo, which means that you need to install a gcc compiler like Mingw to build it properly.
- Go versions less than 1.20 requires gcc version 11 or lower.

Installation

go get github.com/Picovoice/leopard/binding/go/v2

AccessKey

Leopard requires a valid Picovoice AccessKey at initialization. AccessKey acts as your credentials when using Leopard SDKs. You can get your AccessKey for free. Make sure to keep your AccessKey secret. Signup or Login to Picovoice Console to get your AccessKey.

Usage

Create an instance of the engine and transcribe an audio file:

import . "github.com/Picovoice/leopard/binding/go/v2"

leopard := NewLeopard("${ACCESS_KEY}")
err := leopard.Init()
if err != nil {
    // handle err init
}
defer leopard.Delete()

transcript, words, err := leopard.ProcessFile("${AUDIO_FILE_PATH}")
if err != nil {
    // handle process error
}

print(transcript)

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console and ${AUDIO_FILE_PATH} to the path an audio file. Finally, when done be sure to explicitly release the resources using leopard.Delete().

Language Model

The Leopard Go SDK comes preloaded with a default English language model (.pv file). Default models for other supported languages can be found in lib/common.

Create custom language models using the Picovoice Console. Here you can train language models with custom vocabulary and boost words in the existing vocabulary.

Pass in the .pv file by setting .ModelPath on an instance of Leopard before initializing:

leopard := NewLeopard("${ACCESS_KEY}")
leopard.ModelPath = "${MODEL_FILE_PATH}"
err := leopard.Init()

Word Metadata

Along with the transcript, Leopard returns metadata for each transcribed word. Available metadata items are:

Start Time: Indicates when the word started in the transcribed audio. Value is in seconds.
End Time: Indicates when the word ended in the transcribed audio. Value is in seconds.
Confidence: Leopard's confidence that the transcribed word is accurate. It is a number within [0, 1].
Speaker Tag: If speaker diarization is enabled on initialization, the speaker tag is a non-negative integer identifying unique speakers, with 0 reserved for unknown speakers. If speaker diarization is not enabled, the value will always be -1.

Demos

Check out the Leopard Go demos here.

Documentation ¶

Index ¶

Variables
type Leopard
- func NewLeopard(accessKey string) Leopard
type LeopardError
- func (e *LeopardError) Error() string
type LeopardWord
type PvStatus

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	// SampleRate Audio sample rate accepted by Picovoice.
	SampleRate int

	// Version Leopard version
	Version string
)

Functions ¶

This section is empty.

Types ¶

type Leopard ¶

type Leopard struct {

	// AccessKey obtained from Picovoice Console (https://console.picovoice.ai/).
	AccessKey string

	// Absolute path to the file containing model parameters.
	ModelPath string

	// Absolute path to the Leopard's dynamic library.
	LibraryPath string

	// Flag to enable automatic punctuation insertion.
	EnableAutomaticPunctuation bool

	// Flag to enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process.
	// Word metadata will include a `SpeakerTag` to identify unique speakers.
	EnableDiarization bool
	// contains filtered or unexported fields
}

Leopard struct

func NewLeopard ¶

func NewLeopard(accessKey string) Leopard

NewLeopard returns a Leopard struct with default parameters

func (*Leopard) Delete ¶

func (leopard *Leopard) Delete() error

Delete releases resources acquired by Leopard.

func (*Leopard) Init ¶

func (leopard *Leopard) Init() error

Init function for Leopard. Must be called before attempting process

func (*Leopard) Process ¶

func (leopard *Leopard) Process(pcm []int16) (string, []LeopardWord, error)

Processes a given audio data and returns its transcription. The audio needs to have a sample rate equal to `.SampleRate` and be 16-bit linearly-encoded. This function operates on single-channel audio. If you wish to process data in a different sample rate or format consider using `ProcessFile`. Returns the inferred transcription.

func (*Leopard) ProcessFile ¶

func (leopard *Leopard) ProcessFile(audioPath string) (string, []LeopardWord, error)

ProcessFile Processes a given audio file and returns its transcription. The supported formats are: `3gp (AMR)`, `FLAC`, `MP3`, `MP4/m4a (AAC)`, `Ogg`, `WAV`, `WebM`. Returns the inferred transcription.

type LeopardError ¶

type LeopardError struct {
	StatusCode   PvStatus
	Message      string
	MessageStack []string
}

func (*LeopardError) Error ¶

func (e *LeopardError) Error() string

type LeopardWord ¶

type LeopardWord struct {
	// Transcribed word.
	Word string

	// Start of word in seconds.
	StartSec float32

	// End of word in seconds.
	EndSec float32

	// Transcription confidence. It is a number within [0, 1].
	Confidence float32

	// Unique speaker identifier. It is `-1` if diarization is not enabled during initialization; otherwise,
	// it's a non-negative integer identifying unique speakers, with `0` reserved for unknown speakers.
	SpeakerTag int32
}

type PvStatus ¶

type PvStatus int

PvStatus type

const (
	SUCCESS                  PvStatus = 0
	OUT_OF_MEMORY            PvStatus = 1
	IO_ERROR                 PvStatus = 2
	INVALID_ARGUMENT         PvStatus = 3
	STOP_ITERATION           PvStatus = 4
	KEY_ERROR                PvStatus = 5
	INVALID_STATE            PvStatus = 6
	RUNTIME_ERROR            PvStatus = 7
	ACTIVATION_ERROR         PvStatus = 8
	ACTIVATION_LIMIT_REACHED PvStatus = 9
	ACTIVATION_THROTTLED     PvStatus = 10
	ACTIVATION_REFUSED       PvStatus = 11
)

Possible status return codes from the Leopard library

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL