Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ( // SampleRate Audio sample rate accepted by Picovoice. SampleRate int // Version Leopard version Version string )
Functions ¶
This section is empty.
Types ¶
type Leopard ¶
type Leopard struct { // AccessKey obtained from Picovoice Console (https://console.picovoice.ai/). AccessKey string // Absolute path to the file containing model parameters. ModelPath string // Absolute path to the Leopard's dynamic library. LibraryPath string // Flag to enable automatic punctuation insertion. EnableAutomaticPunctuation bool // Flag to enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. // Word metadata will include a `SpeakerTag` to identify unique speakers. EnableDiarization bool // contains filtered or unexported fields }
Leopard struct
func NewLeopard ¶
NewLeopard returns a Leopard struct with default parameters
func (*Leopard) Process ¶
func (leopard *Leopard) Process(pcm []int16) (string, []LeopardWord, error)
Processes a given audio data and returns its transcription. The audio needs to have a sample rate equal to `.SampleRate` and be 16-bit linearly-encoded. This function operates on single-channel audio. If you wish to process data in a different sample rate or format consider using `ProcessFile`. Returns the inferred transcription.
func (*Leopard) ProcessFile ¶
func (leopard *Leopard) ProcessFile(audioPath string) (string, []LeopardWord, error)
ProcessFile Processes a given audio file and returns its transcription. The supported formats are: `3gp (AMR)`, `FLAC`, `MP3`, `MP4/m4a (AAC)`, `Ogg`, `WAV`, `WebM`. Returns the inferred transcription.
type LeopardError ¶
func (*LeopardError) Error ¶
func (e *LeopardError) Error() string
type LeopardWord ¶
type LeopardWord struct { // Transcribed word. Word string // Start of word in seconds. StartSec float32 // End of word in seconds. EndSec float32 // Transcription confidence. It is a number within [0, 1]. Confidence float32 // Unique speaker identifier. It is `-1` if diarization is not enabled during initialization; otherwise, // it's a non-negative integer identifying unique speakers, with `0` reserved for unknown speakers. SpeakerTag int32 }
type PvStatus ¶
type PvStatus int
PvStatus type
const ( SUCCESS PvStatus = 0 OUT_OF_MEMORY PvStatus = 1 IO_ERROR PvStatus = 2 INVALID_ARGUMENT PvStatus = 3 STOP_ITERATION PvStatus = 4 KEY_ERROR PvStatus = 5 INVALID_STATE PvStatus = 6 RUNTIME_ERROR PvStatus = 7 ACTIVATION_ERROR PvStatus = 8 ACTIVATION_LIMIT_REACHED PvStatus = 9 ACTIVATION_THROTTLED PvStatus = 10 ACTIVATION_REFUSED PvStatus = 11 )
Possible status return codes from the Leopard library