suzu

package module
v0.0.0-...-f0c043b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 14, 2024 License: Apache-2.0 Imports: 39 Imported by: 0

README

Audio Streaming Gateway Suzu

GitHub tag (latest SemVer) License

About Shiguredo's open source software

We will not respond to PRs or issues that have not been discussed on Discord. Also, Discord is only available in Japanese.

Please read https://github.com/shiguredo/oss/blob/master/README.en.md before use.

時雨堂のオープンソースソフトウェアについて

利用前に https://github.com/shiguredo/oss をお読みください。

Audio Streaming Gateway Suzu について

Suzu は WebRTC SFU Sora 専用の音声解析用ゲートウェイです。 Suzu は Sora から送られてくる音声ストリーミングを HTTP/2 経由で受け取り、音声解析サービスに転送し、その解析結果を Sora に送ります。 Sora は Suzu から送られてきた解析結果を、プッシュ API を経由してリアルタイムにクライアントへ通知します。

目的

リアルタイム通話で気軽に音声解析サービスを利用できる仕組みを提供することです。

特徴

  • Sora から音声データを HTTP/2 経由で受け取り、音声解析サービスへ送信します
  • 音声解析サービスの解析結果を HTTP/2 レスポンスで Sora に戻します
  • Sora は受け取った解析結果をクライアントへプッシュで送信します
  • 音声解析に必要とされる言語コードをクライアントごとに指定できます
  • 無限リトライ対応
  • mTLS 対応

使ってみる

Suzu を使ってみたい人は USE.md をお読みください。

Suzu と GCP Speech to Text

sequenceDiagram
    participant client1 as クライアント1<br>sendrecv
    participant client2 as クライアント2<br>recvonly
    participant sora as WebRTC SFU Sora
    participant suzu as Audio Streaming Gateway Suzu
    participant app as アプリケーションサーバー
    participant gcp as GCP Speech to Text
    note over client1, sora: WebRTC 確立
    sora-)client1: "type": "switched"
    note over client1, sora: DataChannel 確立
    par
        client1-)sora: Opus over SRTP
        sora-)suzu: Opus over HTTP/2
        note over suzu: Opus を Ogg コンテナに詰める
        suzu-)gcp: Ogg over HTTP/2
        note over gcp: 音声データが十分ではないためまだ解析結果が返せない
    and
        client1-)sora: Opus over SRTP
        sora-)suzu: Opus over HTTP/2
        suzu-)gcp: Ogg over HTTP/2
        gcp-)suzu: 音声解析結果<br>JSON over HTTP/2
        suzu-)sora: 音声解析結果<br>JSON over HTTP/2
        sora-)client1: プッシュ通知<br>音声解析結果<br>JSON over DataChannel
    end
    par
        note over client2, sora: WebRTC 確立
        sora-)client2: "type": "switched"
        note over client2, sora: DataChannel 確立
    and
        client1-)sora: Opus over SRTP
        sora-)suzu: Opus over HTTP/2
        suzu-)gcp: Ogg over HTTP/2
        gcp-)suzu: 音声解析結果<br>JSON over HTTP/2
        suzu-)sora: 音声解析結果<br>JSON over HTTP/2
    end
    par
        sora-)client1: プッシュ通知<br>音声解析結果<br>JSON over DataChannel
    and
        sora-)client2: プッシュ通知<br>音声解析結果<br>JSON over DataChannel
    end

対応サービス

ライセンス

Copyright 2022-2024, Hiroshi Yoshida (Original Author)
Copyright 2022-2024, Shiguredo Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

優先実装

優先実装とは Sora のライセンスを契約頂いているお客様限定で Suzu の実装予定機能を有償にて前倒しで実装することです。

優先実装が可能な機能一覧

詳細は Discord やメールなどでお気軽にお問い合わせください。

Documentation

Overview

Package oggwriter implements OGG media container writer

Index

Constants

View Source
const (
	DefaultLogDir  = "."
	DefaultLogName = "suzu.jsonl"

	// megabytes
	DefaultLogRotateMaxSize    = 200
	DefaultLogRotateMaxBackups = 7
	// days
	DefaultLogRotateMaxAge = 30

	DefaultExporterListenAddr = "0.0.0.0"
	DefaultExporterListenPort = 5891

	// 10s
	DefaultTimeToWaitForOpusPacketMs = 10000

	// リトライ間隔 100ms
	DefaultRetryIntervalMs = 100
)
View Source
const (
	FrameSize        = 1024 * 10
	HeaderLength     = 20
	MaxPayloadLength = 0xffff
)

Variables

View Source
var (
	ErrMissingAudioStreamingLanguageCode = fmt.Errorf("MISSING-SORA-AUDIO-STREAMING-LANGUAGE-CODE")
	ErrUnsupportedLanguageCode           = fmt.Errorf("UNSUPPORTED-LANGUAGE-CODE")
	ErrUnsupportedService                = fmt.Errorf("UNSUPPORTED-SERVICE")
)
View Source
var (
	NewServiceHandlerFuncs = make(newServiceHandlerFuncs)

	ErrServiceNotFound = fmt.Errorf("SERVICE-NOT-FOUND")
)
View Source
var (
	// TODO: 分かりにくい場合はエラー名を変更する
	// このエラーの場合は再接続を試みる
	ErrServerDisconnected = fmt.Errorf("SERVER-DISCONNECTED")
)
View Source
var Version string

Functions

func GetLanguageCode

func GetLanguageCode(serviceType, lang string, f func(string) (string, error)) (string, error)

func GranulePosition

func GranulePosition(config byte) uint64

func InitLogger

func InitLogger(config *Config) error

InitLogger ロガーを初期化する

func NewAmazonTranscribeHandler

func NewAmazonTranscribeHandler(config Config, channelID, connectionID string, sampleRate uint32, channelCount uint16, languageCode string, onResultFunc any) serviceHandlerInterface

func NewOpusReader

func NewOpusReader(c Config, d time.Duration, opusReader io.ReadCloser) io.ReadCloser

func NewPacketDumpHandler

func NewPacketDumpHandler(config Config, channelID, connectionID string, sampleRate uint32, channelCount uint16, languageCode string, onResultFunc any) serviceHandlerInterface

func NewSpeechToTextHandler

func NewSpeechToTextHandler(config Config, channelID, connectionID string, sampleRate uint32, channelCount uint16, languageCode string, onResultFunc any) serviceHandlerInterface

func NewSpeechpbRecognitionConfig

func NewSpeechpbRecognitionConfig(rc RecognitionConfig) *speechpb.RecognitionConfig

func NewStreamingRecognitionConfig

func NewStreamingRecognitionConfig(recognitionConfig *speechpb.RecognitionConfig, singleUtterance, interimResults bool) *speechpb.StreamingRecognizeRequest_StreamingConfig

func NewTestHandler

func NewTestHandler(config Config, channelID, connectionID string, sampleRate uint32, channelCount uint16, languageCode string, onResultFunc any) serviceHandlerInterface

func ShowConfig

func ShowConfig(config *Config)

Types

type AmazonTranscribe

type AmazonTranscribe struct {
	LanguageCode                      string
	MediaEncoding                     string
	MediaSampleRateHertz              int64
	EnablePartialResultsStabilization bool
	NumberOfChannels                  int64
	EnableChannelIdentification       bool
	PartialResultsStability           string
	Region                            string
	Debug                             bool
	Config                            Config
}

func NewAmazonTranscribe

func NewAmazonTranscribe(config Config, languageCode string, sampleRateHertz, audioChannelCount int64) *AmazonTranscribe

type AmazonTranscribeHandler

type AmazonTranscribeHandler struct {
	Config Config

	ChannelID    string
	ConnectionID string
	SampleRate   uint32
	ChannelCount uint16
	LanguageCode string
	RetryCount   int

	OnResultFunc func(context.Context, io.WriteCloser, string, string, string, any) error
	// contains filtered or unexported fields
}

func (*AmazonTranscribeHandler) GetRetryCount

func (h *AmazonTranscribeHandler) GetRetryCount() int

func (*AmazonTranscribeHandler) Handle

func (h *AmazonTranscribeHandler) Handle(ctx context.Context, reader io.Reader) (*io.PipeReader, error)

func (*AmazonTranscribeHandler) ResetRetryCount

func (h *AmazonTranscribeHandler) ResetRetryCount() int

func (*AmazonTranscribeHandler) UpdateRetryCount

func (h *AmazonTranscribeHandler) UpdateRetryCount() int

type AwsResult

type AwsResult struct {
	ChannelID *string `json:"channel_id,omitempty"`
	IsPartial *bool   `json:"is_partial,omitempty"`
	ResultID  *string `json:"result_id,omitempty"`
	TranscriptionResult
}

func NewAwsResult

func NewAwsResult() AwsResult

func (*AwsResult) SetMessage

func (ar *AwsResult) SetMessage(message string) *AwsResult

func (*AwsResult) WithChannelID

func (ar *AwsResult) WithChannelID(channelID string) *AwsResult

func (*AwsResult) WithIsPartial

func (ar *AwsResult) WithIsPartial(isPartial bool) *AwsResult

func (*AwsResult) WithResultID

func (ar *AwsResult) WithResultID(resultID string) *AwsResult

type Config

type Config struct {
	Version string

	Debug bool `ini:"debug"`

	HTTPS      bool   `ini:"https"`
	ListenAddr string `ini:"listen_addr"`
	ListenPort int    `ini:"listen_port"`

	AudioStreamingHeader bool `ini:"audio_streaming_header"`

	TLSFullchainFile    string `ini:"tls_fullchain_file"`
	TLSPrivkeyFile      string `ini:"tls_privkey_file"`
	TLSVerifyCacertPath string `ini:"tls_verify_cacert_path"` // クライアント認証用

	HTTP2MaxConcurrentStreams uint32 `ini:"http2_max_concurrent_streams"`
	HTTP2MaxReadFrameSize     uint32 `ini:"http2_max_read_frame_size"`
	HTTP2IdleTimeout          uint32 `ini:"http2_idle_timeout"`

	MaxRetry        int `ini:"max_retry"`
	RetryIntervalMs int `ini:"retry_interval_ms"`

	ExporterHTTPS      bool   `ini:"exporter_https"`
	ExporterListenAddr string `ini:"exporter_listen_addr"`
	ExporterListenPort int    `ini:"exporter_listen_port"`

	SkipBasicAuth     bool   `ini:"skip_basic_auth"`
	BasicAuthUsername string `ini:"basic_auth_username"`
	BasicAuthPassword string `ini:"basic_auth_password"`

	SampleRate   int `ini:"audio_sample_rate"`
	ChannelCount int `ini:"audio_channel_count"`

	DumpFile string `ini:"dump_file"`

	LogDir              string `ini:"log_dir"`
	LogName             string `ini:"log_name"`
	LogStdout           bool   `ini:"log_stdout"`
	LogRotateMaxSize    int    `ini:"log_rotate_max_size"`
	LogRotateMaxBackups int    `ini:"log_rotate_max_backups"`
	LogRotateMaxAge     int    `ini:"log_rotate_max_age"`

	TimeToWaitForOpusPacketMs int `ini:"time_to_wait_for_opus_packet_ms"`

	// aws の場合は IsPartial が false, gcp の場合は IsFinal が true の場合にのみ結果を返す指定
	FinalResultOnly bool `ini:"final_result_only"`

	// Amazon Web Services
	AwsCredentialFile                    string `ini:"aws_credential_file"`
	AwsProfile                           string `ini:"aws_profile"`
	AwsRegion                            string `ini:"aws_region"`
	AwsEnablePartialResultsStabilization bool   `ini:"aws_enable_partial_results_stabilization"`
	AwsPartialResultsStability           string `ini:"aws_partial_results_stability"`
	AwsEnableChannelIdentification       bool   `ini:"aws_enable_channel_identification"`
	// 変換結果に含める項目の有無の指定
	AwsResultChannelID bool `ini:"aws_result_channel_id"`
	AwsResultIsPartial bool `ini:"aws_result_is_partial"`
	AwsResultID        bool `ini:"aws_result_id"`

	// Google Cloud Platform
	GcpCredentialFile                      string   `ini:"gcp_credential_file"`
	GcpEnableSeparateRecognitionPerChannel bool     `ini:"gcp_enable_separate_recognition_per_channel"`
	GcpAlternativeLanguageCodes            []string `ini:"gcp_alternative_language_codes"`
	GcpMaxAlternatives                     int32    `ini:"gcp_max_alternatives"`
	GcpProfanityFilter                     bool     `ini:"gcp_profanity_filter"`
	GcpEnableWordTimeOffsets               bool     `ini:"gcp_enable_word_time_offsets"`
	GcpEnableWordConfidence                bool     `ini:"gcp_enable_word_confidence"`
	GcpEnableAutomaticPunctuation          bool     `ini:"gcp_enable_automatic_punctuation"`
	GcpEnableSpokenPunctuation             bool     `ini:"gcp_enable_spoken_punctuation"`
	GcpEnableSpokenEmojis                  bool     `ini:"gcp_enable_spoken_emojis"`
	GcpModel                               string   `ini:"gcp_model"`
	GcpUseEnhanced                         bool     `ini:"gcp_use_enhanced"`
	GcpSingleUtterance                     bool     `ini:"gcp_single_utterance"`
	GcpInterimResults                      bool     `ini:"gcp_interim_results"`
	// 変換結果に含める項目の有無の指定
	GcpResultIsFinal   bool `ini:"gcp_result_is_final"`
	GcpResultStability bool `ini:"gcp_result_stability"`
}

func NewConfig

func NewConfig(configFilePath string) (*Config, error)

type GcpResult

type GcpResult struct {
	IsFinal   *bool    `json:"is_final,omitempty"`
	Stability *float32 `json:"stability,omitempty"`
	TranscriptionResult
}

func NewGcpResult

func NewGcpResult() GcpResult

func (*GcpResult) SetMessage

func (gr *GcpResult) SetMessage(message string) *GcpResult

func (*GcpResult) WithIsFinal

func (gr *GcpResult) WithIsFinal(isFinal bool) *GcpResult

func (*GcpResult) WithStability

func (gr *GcpResult) WithStability(stability float32) *GcpResult

type OggWriter

type OggWriter struct {
	// contains filtered or unexported fields
}

OggWriter is used to take RTP packets and write them to an OGG on disk

func New

func New(fileName string, sampleRate uint32, channelCount uint16) (*OggWriter, error)

New builds a new OGG Opus writer

func NewWith

func NewWith(out io.Writer, sampleRate uint32, channelCount uint16) (*OggWriter, error)

NewWith initialize a new OGG Opus writer with an io.Writer output

func (*OggWriter) Close

func (i *OggWriter) Close() error

Close stops the recording

func (*OggWriter) Write

func (i *OggWriter) Write(opusPacket *codecs.OpusPacket) error

func (*OggWriter) WriteRTP

func (i *OggWriter) WriteRTP(packet *rtp.Packet) error

WriteRTP adds a new packet and writes the appropriate headers for it

type PacketDumpHandler

type PacketDumpHandler struct {
	Config Config

	ChannelID    string
	ConnectionID string
	SampleRate   uint32
	ChannelCount uint16
	LanguageCode string
	RetryCount   int

	OnResultFunc func(context.Context, io.WriteCloser, string, string, string, any) error
	// contains filtered or unexported fields
}

func (*PacketDumpHandler) GetRetryCount

func (h *PacketDumpHandler) GetRetryCount() int

func (*PacketDumpHandler) Handle

func (h *PacketDumpHandler) Handle(ctx context.Context, reader io.Reader) (*io.PipeReader, error)

func (*PacketDumpHandler) ResetRetryCount

func (h *PacketDumpHandler) ResetRetryCount() int

func (*PacketDumpHandler) UpdateRetryCount

func (h *PacketDumpHandler) UpdateRetryCount() int

type PacketDumpResult

type PacketDumpResult struct {
	Timestamp    int64  `json:"timestamp"`
	ChannelID    string `json:"channel_id"`
	ConnectionID string `json:"connection_id"`
	LanguageCode string `json:"language_code"`
	SampleRate   uint32 `json:"sample_rate"`
	ChannelCount uint16 `json:"channel_count"`
	Payload      []byte `json:"payload"`
}

type RecognitionConfig

type RecognitionConfig struct {
	Encoding                            speechpb.RecognitionConfig_AudioEncoding
	SampleRateHertz                     int32
	AudioChannelCount                   int32
	EnableSeparateRecognitionPerChannel bool
	LanguageCode                        string
	AlternativeLanguageCodes            []string
	MaxAlternatives                     int32
	ProfanityFilter                     bool
	SpeechContexts                      []*speechpb.SpeechContext
	EnableWordTimeOffsets               bool
	EnableWordConfidence                bool
	EnableAutomaticPunctuation          bool
	EnableSpokenPunctuation             bool
	EnableSpokenEmojis                  bool
	Model                               string
	UseEnhanced                         bool
}

func NewRecognitionConfig

func NewRecognitionConfig(c Config, languageCode string, sampleRate, channelCount int32) RecognitionConfig

type Server

type Server struct {
	// contains filtered or unexported fields
}

func NewServer

func NewServer(c *Config, service string) (*Server, error)

func (*Server) Start

func (s *Server) Start(ctx context.Context) error

func (*Server) StartExporter

func (s *Server) StartExporter(ctx context.Context) error

type SpeechToText

type SpeechToText struct {
	SampleReate  int32
	ChannelCount int32
	LanguageCode string
	Config       Config
}

func NewSpeechToText

func NewSpeechToText(config Config, languageCode string, sampleRate, channelCount int32) SpeechToText

func (SpeechToText) Start

type SpeechToTextHandler

type SpeechToTextHandler struct {
	Config Config

	ChannelID    string
	ConnectionID string
	SampleRate   uint32
	ChannelCount uint16
	LanguageCode string
	RetryCount   int

	OnResultFunc func(context.Context, io.WriteCloser, string, string, string, any) error
	// contains filtered or unexported fields
}

func (*SpeechToTextHandler) GetRetryCount

func (h *SpeechToTextHandler) GetRetryCount() int

func (*SpeechToTextHandler) Handle

func (h *SpeechToTextHandler) Handle(ctx context.Context, reader io.Reader) (*io.PipeReader, error)

func (*SpeechToTextHandler) ResetRetryCount

func (h *SpeechToTextHandler) ResetRetryCount() int

func (*SpeechToTextHandler) UpdateRetryCount

func (h *SpeechToTextHandler) UpdateRetryCount() int

type SuzuError

type SuzuError struct {
	Code    int
	Message string
	Retry   bool
}

func (*SuzuError) Error

func (e *SuzuError) Error() string

func (*SuzuError) IsRetry

func (e *SuzuError) IsRetry() bool

type TestHandler

type TestHandler struct {
	Config Config

	ChannelID    string
	ConnectionID string
	SampleRate   uint32
	ChannelCount uint16
	LanguageCode string
	RetryCount   int

	OnResultFunc func(context.Context, io.WriteCloser, string, string, string, any) error
	// contains filtered or unexported fields
}

func (*TestHandler) GetRetryCount

func (h *TestHandler) GetRetryCount() int

func (*TestHandler) Handle

func (h *TestHandler) Handle(ctx context.Context, reader io.Reader) (*io.PipeReader, error)

func (*TestHandler) ResetRetryCount

func (h *TestHandler) ResetRetryCount() int

func (*TestHandler) UpdateRetryCount

func (h *TestHandler) UpdateRetryCount() int

type TestResult

type TestResult struct {
	ChannelID *string `json:"channel_id,omitempty"`
	TranscriptionResult
}

func NewTestResult

func NewTestResult(channelID, message string) TestResult

type TranscriptionResult

type TranscriptionResult struct {
	Message string `json:"message,omitempty"`
	Reason  string `json:"reason,omitempty"`
	Type    string `json:"type"`
}

func NewSuzuErrorResponse

func NewSuzuErrorResponse(err error) TranscriptionResult

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL