yask

package module

v1.0.3 Latest Latest Go to latest Published: Jul 20, 2022 License: MIT Imports: 11 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

git.ali33.ru/fcg-xvii/yask

Links

Open Source Insights

README ¶

English | Русский

yask

Tools for work with the synthesis and speech recognition service Yandex Speech Kit (more about in https://cloud.yandex.ru/docs/speechkit/) for golang programming language. Used to synthesize speech from text and recognize text from a sound stream.

Before start to use, you must register at https://cloud.yandex.ru/ to get the API key and directory identifier (more about https://cloud.yandex.ru/docs).

Audio stream formats

OGG https://en.wikipedia.org/wiki/Ogg
PCM https://en.wikipedia.org/wiki/Pulse-code_modulation (when recognizing text in the lpcm format parameter, a wav format stream can be used

Speech synthesis from text

As a result of the example, get a file in wav format, ready for playback in any player program. The default bitrate is 8000.

import (
	"log"
	"os"

	"github.com/fcg-xvii/go-tools/speech/yask"
)

func main() {
	yaFolderID := "b1g..."    // yandex folder id
	yaAPIKey := "AQVNy..."    // yandex api yandex
	text := "Hi It's test of speech synthesis" // text for synthesis

	// init config for synthesis (по умоланию установлен формат lpcm)
	config := yask.TTSDefaultConfigText(yaFolderID, yaAPIKey, text)

    // By default language in config russian. For english must setup 
    // english language and voice
    config.Lang = yask.LangEN
    config.Voice = yask.VoiceNick


	// speech synthesis
	r, err := yask.TextToSpeech(config)
	if err != nil {
		log.Println(err)
		return
	}

    // open file for save result
	f, err := os.OpenFile("tts.wav", os.O_RDWR|os.O_CREATE|os.O_TRUNC, 0655)
	if err != nil {
		log.Println(err)
		return
	}
	defer f.Close()

    // lpcm encoding to wav format
	if err := yask.EncodePCMToWav(r, f, config.Rate, 16, 1); err != nil {
		log.Println(err)
		return
	}
}

Speech recognition to text

Example of recognition of short audio. The example uses a wav file that can be used with a configuration format value of lpcm

package main

import (
	"log"
	"os"

	"github.com/fcg-xvii/go-tools/speech/yask"
)

func main() {
	yaFolderID := "b1g4..." // yandex folder id
	yaAPIKey := "AQVNyr..." // yandex api key
	dataFileName := "data.wav" // audio file in wav format for recodnition to text

    // open audio file
	f, err := os.Open(dataFileName)
	if err != nil {
		log.Println(err)
		return
	}
	defer f.Close()

    // init config for recodnition
	config := yask.STTConfigDefault(yaFolderID, yaAPIKey, f)

    // setup english language
    config.Lang = yask.LangEN

    // recodnition speech to text
	text, err := yask.SpeechToTextShort(config)
	if err != nil {
		log.Println(err)
		return
	}

	log.Println(text)
}

License

The MIT License (MIT), see LICENSE.

Documentation ¶

Index ¶

Constants
func EncodePCMToWav(in io.Reader, out io.WriteSeeker, sampleRate, bitDepth, numChans int) error
func SpeechToTextShort(conf *STTConfig) (string, error)
func TextToSpeech(config *TTSConfig) (io.ReadCloser, error)
type STTConfig
- func STTConfigDefault(yaFolderID, yaAPIKey string, data io.Reader) *STTConfig
type TTSConfig
- func TTSDefaultConfigSSML(yaFolderID, yaAPIKey, SSML string) *TTSConfig
- func TTSDefaultConfigText(yaFolderID, yaAPIKey, text string) *TTSConfig
type Voice
- func Voices(lang string, sex, premium int) (res []Voice)

Constants ¶

View Source

const (
	// YaSTTUrl is url for send speech to text requests
	YaSTTUrl = "https://stt.api.cloud.yandex.net/speech/v1/stt:recognize"

	// YaTTSUrl is url for send text to speech requests
	YaTTSUrl = "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize"

	// FormatLPCM is PCM audio format (wav) without wav header (more details in https://en.wikipedia.org/wiki/Pulse-code_modulation)
	FormatLPCM = "lpcm"
	// FormatOgg is audio ogg format
	FormatOgg = "oggopus"

	// Rate8k is rate of 8kHz
	Rate8k int = 8000
	// Rate16k is rate of 16kHz
	Rate16k int = 16000
	// Rate48k is rate of 48kHz
	Rate48k int = 48000

	// LangRU is russian language
	LangRU = "ru-Ru"
	// LangEN is english language
	LangEN = "en-US"
	// LangTR is turkish language
	LangTR = "tr-TR"

	// SpeedStandard is standart speed of voice (1.0)
	SpeedStandard float32 = 1.0
	// SpeedMostFastest is maximum speed voice (3.0)
	SpeedMostFastest float32 = 3.0
	// SpeedSlowest is minimum speed of voice (0.1)
	SpeedSlowest float32 = 0.1

	// VoiceOksana is Oksana voice (russian, female, standard)
	VoiceOksana = "oksana"
	// VoiceJane is Jane voice (russian, female, standard)
	VoiceJane = "jane"
	// VoiceOmazh is Omazh voice (russian, female, standard)
	VoiceOmazh = "omazh"
	// VoiceZahar is Zahar voice (russian, male, standard)
	VoiceZahar = "zahar"
	// VoiceErmil is Ermil voice (russian, male, standard)
	VoiceErmil = "ermil"
	// VoiceSilaerkan is Silaerkan voice (turkish, female, standard)
	VoiceSilaerkan = "silaerkan"
	// VoiceErkanyavas is Erkanyavas voice (turkish, male, standard)
	VoiceErkanyavas = "erkanyavas"
	// VoiceAlyss is Alyss voice (english, female, standard)
	VoiceAlyss = "alyss"
	// VoiceNick is Nick voice (engish, male, standard)
	VoiceNick = "nick"
	// VoiceAlena is Alena voice (russian, female, premium)
	VoiceAlena = "alena"
	// VoiceFilipp is Filipp voice (russian, male, premium)
	VoiceFilipp = "filipp"

	// EmotionGood is good voice emotion
	EmotionGood = "good"
	// EmotionEvil is evil voice emotion
	EmotionEvil = "evil"
	// EmotionNeutral is neutral voice emotion
	EmotionNeutral = "neutral"

	// TopicGeneral is current version of voice model (available in all languages)
	TopicGeneral = "general"
	// TopicGeneralRC is experimental version of voice model (russian language)
	TopicGeneralRC = "general:rc"
	// TopicGeneralDeprecated is deprecated version of voice model (russian language)
	TopicGeneralDeprecated = "general:deprecated"
	// TopicMaps is model for addresses anc company names
	TopicMaps = "maps"

	// SexAll is male and female
	SexAll = 0
	// SexMale is male
	SexMale = 1
	// SexFemale is female
	SexFemale = 2
)

Variables ¶

This section is empty.

Functions ¶

func EncodePCMToWav ¶

func EncodePCMToWav(in io.Reader, out io.WriteSeeker, sampleRate, bitDepth, numChans int) error

EncodePCMToWav encode input stream of pcm audio format to wav and write to out stream

func SpeechToTextShort ¶

func SpeechToTextShort(conf *STTConfig) (string, error)

SpeechToTextShort returns text from a PCM or OGG sound stream using the service Yandex Speech Kit

func TextToSpeech ¶

func TextToSpeech(config *TTSConfig) (io.ReadCloser, error)

TextToSpeech returns PCM or OGG sound stream using the service Yandex Speech Kit. Result PCM stream can be converted to Wav stream using EncodePCMToWav

Types ¶

type STTConfig ¶

type STTConfig struct {
	Lang            string
	Topic           string
	ProfanityFilter bool
	Format          string
	Rate            int
	YaFolderID      string
	YaAPIKey        string
	Data            io.Reader
}

STTConfig is config for speech to text methods

func STTConfigDefault ¶

func STTConfigDefault(yaFolderID, yaAPIKey string, data io.Reader) *STTConfig

STTConfigDefault returns STTConfig with default parameters

type TTSConfig ¶

type TTSConfig struct {
	Text       string
	SSML       string
	Lang       string
	Voice      string
	Emotion    string
	Speed      float32
	Format     string
	Rate       int
	YaFolderID string
	YaAPIKey   string
}

TTSConfig is config for text to speeh method

func TTSDefaultConfigSSML ¶

func TTSDefaultConfigSSML(yaFolderID, yaAPIKey, SSML string) *TTSConfig

TTSDefaultConfigSSML returns config with default parameters for raw text recognition and use in TextToSpeech method more details of SSML language in https://cloud.yandex.ru/docs/speechkit/tts/ssml

func TTSDefaultConfigText ¶

func TTSDefaultConfigText(yaFolderID, yaAPIKey, text string) *TTSConfig

TTSDefaultConfigText returns config with default parameters for raw text recognition and use in TextToSpeech method

type Voice ¶

type Voice struct {
	NameEn  string `json:"name_en"`
	MameRu  string `json:"name_ru"`
	Voice   string `json:"voice"`
	Lang    string `json:"lang"`
	Male    bool   `json:"is_male"`
	Premium bool   `json:"is_premium"`
}

Voice is struct of voice object into

func Voices ¶

func Voices(lang string, sex, premium int) (res []Voice)

Voices returns slice of available vioces lang: empty (all alngs) ru-RU, en-EN, tr-TR sex: 0 - all, 1 - male, 2 - female premium: 0 - all, 1 - standard only, 2 - premium only

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL