livecaption

package
v0.0.0-...-1140a65 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 8, 2024 License: AGPL-3.0 Imports: 13 Imported by: 0

README

Audio Stream

This exercise is about streaming. The private method sendStreamToGCP sends an audio stream of bytes, read from a test audio .wav file in public/ directory. The readStreamFromGCP it then receives the speech-to-text stream and prints it. To run this please ensure that you have set GOOGLE_APPLICATION_CREDENTIALS with appropriate permissions for your GCP Project.

The go routines run in parallel:

  1. sendStreamToGCP: audio file => Reader() stream => StreamingRecognizeClient Send()
  2. readStreamFromGCP: StreamingRecognizeClient Recv() => stream Writer() => os.Stdout

The API limits are specified in Quotas & limits. There is no limit on this streaming version. Below is the transcription of I Have a Dream, Martin Luther King Jr. speech by this API. The speech audio is 16 minutes long (at 22K bit rate).

Mac Audio

On a Mac:

  1. You can create a sample .wav file using ffmpeg pre-installed program.
  2. Or you can stream audio to UDP port and forward it using Go routing to recognizer.
Audio Samples on Mac

As example epilogue of A Pale Blue Dot recorded in public/paleBlueDot.wav:

There is perhaps no better demonstration of the folly of human 
conceits than this distant image of our tiny world. To me, it underscores our 
responsibility to deal more kindly with one another, and to preserve and cherish 
the pale blue dot, the only home we've ever known.

The Google ML command line shows the reference output: gcloud ml speech recognize ../../public/paleBlueDot.wav --language-code=en_US

Output:

{
  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.9397845,
          "transcript": "there is perhaps no better demonstration of the Folly of human conceit than this image of a world to me and of course a responsibility to deal more kindly with one another and to preserve and cherish the pale blue dot"
        }
      ]
    }
  ]
}

You can check out yourself what stream client returned.

Handy Mac Shell Commands

You can use ffmpeg Mac command line program to record a .wav file or stream PCM to a UDP port. Some examples are:

# List livecaption devices, Mac
ffmpeg -f avfoundation -list_devices true -i ""
# Record 20 seconds of livecaption from the built in microphone and save it in playBlueDot.mp3
ffmpeg -f avfoundation -i ":1" -t 20 ../../public/playBlueDot.wav
# Stream s16le to a UDP port 9999, and send that livecaption to GCP
ffmpeg -f avfoundation -i ":1" -acodec pcm_s16le -ar 48000 -f s16le udp://localhost:9999
ffmpeg -formats | grep PCM  # see pcl formats
nc -u -l localhost 9999 # Starts a UDP server, and listen to the port
nc -u localhost 9999 # stats a client

# Meta data
mdls chapter1/audio/playBlueDot.wav
# Test playback
afplay chapter1/audio/playBlueDot.wav
ffmpeg -i inputFilename.m4a OutputFilename.wav

The same command can be used to recording Video files:

# Record from video device 0 and livecaption device 0:
ffmpeg -r 30 -f avfoundation -i "0:1" ../../public/paleBlueDot.mp4
ffmpeg -f avfoundation -framerate 30 -video_size 640x480 -i "0:1" ../../public/paleBlueDot.mp4

Formats flag in the command are:

ffmpeg flags:
  -f = "force format". In this case we're forcing the use of AVFoundation
  -i = input source. Typically it's a file, but you can use devices.
        "0:1" = Record both audio and video from FaceTime camera and built-in mic
        "0" = Record just video from FaceTime camera
        ":1" = Record just audio from built-in mic
  -t = time in seconds. If you want it to run indefinitely until you stop it  (ControlC)

Material:

You may find following links handy if you like to check more audio related stuff in Golang.

MLK Speech Transcribed

Words transcribed: 529 v/s actual words in speech 881.

 I say to you today my friend
 so even though we Face the difficulties of today and tomorrow
 I still have a dream
 it is a dream deeply rooted in the American dream
 I have a dream
 one day
 this nation will rise up
 live out the true meaning of its trees
 we hold these truths to be self-evident that all men are created equal
 I have a dream
 that one day on the Red Hills of Georgia
 sons of former slaves and the sons of former slave owners
 will baby be able to sit down together at the table of Brotherhood I have a dream
 the one thing
 even the state of Mississippi a state sweltering with the keto Zone Injustice
 sweltering with the heat of Oppression
 be transformed into an oasis of freedom and Justice I have a trees
 my four little children
 one day live in a nation where they will not be judged by the color of their skin but by the content of a character I have a dream today
 I have a dream that one day
 in Alabama with its vicious racists
 what's its Governor having his lips dripping with the words of interposition and nullification one day right there in Alabama little black boys and black girls little join hands with little white balls and white girls as sisters and brothers I have a dream today
 I have a dream that one day I shall be exalted never healed in Mountain shall be made Low Places would be made friends and the Crooked places will be made at All Flesh shall see it together and I hope this is a piece that I go back to the Southwest River space we will be able to shoot out of the Mountain of Despair a stone of hope we will be able to transform the jangling call Javon nation into a beautiful Symphony of Brotherhood with this face we will be able to work together to pray together to struggle together to go to jail for Freedom together knowing that we will be free one day
 this will be the day
 this will be the day when all of God's children
 be able to sing with new meaning my country tears would be
 sweet land of liberty of Beyonce the Pilgrim's Pride from every Mountainside Let Freedom Ring Americans to be a great nation this must be come true and so Let Freedom Ring
 from the mighty mountains of New York
 Let Freedom Ring from the highwomen alligators of Pennsylvania Let Freedom Ring from the smoke
 Let Freedom Ring from the probation. California but not only that
 Let Freedom Ring from Stone Mountain of Georgia
 Let Freedom Ring from Lookout Mountain of Tennessee and Mississippi State play tomorrow in Japanese
 turn wheel of freedom
 when we let it ring from every finish whatever Hamlet from every state and their Parsippany
 we will be able to speed up that they put on Jews and Gentiles Protestants and Catholics will be able to tell her I'm sending the words of the old Negro spiritual free at last free at last thank God Almighty we are free

Documentation

Overview

Package livecaption implements Google speech-to-text API. It inputs audio from RTP stream and associates a connection with GCP and displays the output on terminal. Red color text is not final and Green color text is final output of speech recognizer.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func InitCli

func InitCli()

InitCli for command: the-gpl stt # Convert Speech on port 9999 to text

eg: the-gpl stt -port=9999

func StreamAudioFile

func StreamAudioFile(fName string, w io.Writer)

StreamAudioFile streams a livecaption file to Google Speech to text recognizer

func StreamRTPPort

func StreamRTPPort(address string, w io.Writer)

func StreamSpeechToText

func StreamSpeechToText(w io.Writer, r io.Reader, wg *sync.WaitGroup)

StreamSpeechToText streams a test livecaption file 'currentTestFile' to Google speech to text engine. It prints the output on io.Writer passed to it.

Types

type CLI

type CLI struct {
	// contains filtered or unexported fields
}

CLI wrapper for *flag.FlagSet. Implements serve.CmdHandlers CLI interface.

func (CLI) DisplayHelp

func (a CLI) DisplayHelp()

DisplayHelp prints help on command line for the live-caption module

func (CLI) ExecCmd

func (a CLI) ExecCmd(args []string)

ExecCmd run stt command dispatched from CLI

type LineColor

type LineColor string

LineColor defined a colors for a line, see Terminal output in Go.

const (
	Reset  LineColor = "\033[0m"
	Red    LineColor = "\033[31m"
	Green  LineColor = "\033[32m"
	Yellow LineColor = "\033[33m"
	Blue   LineColor = "\033[34m"
	Purple LineColor = "\033[35m"
	Cyan   LineColor = "\033[36m"
	Gray   LineColor = "\033[37m"
	White  LineColor = "\033[97m"
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL