livecaption

command

v0.0.0-...-b11a5d3 Latest Latest Go to latest Published: Dec 7, 2024 License: Apache-2.0 Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/GoogleCloudPlatform/golang-samples

README ¶

Google Cloud Speech API Go example

Authentication

Create a project with the Google Cloud Console, and enable the Speech API.
From the Cloud Console, create a service account, download its json credentials file, then set the GOOGLE_APPLICATION_CREDENTIALS environment variable:
```
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your-project-credentials.json
```

Run the sample

Before running any example you must first install the Speech API client:

go get -u cloud.google.com/go/speech/apiv1

To run the example with a local file:

go build
cat ../testdata/audio.raw | livecaption

Capturing audio from the mic

Alternatively, gst-launch can be used to capture audio from the mic. For example:

gst-launch-1.0 -v pulsesrc ! audioconvert ! audioresample ! audio/x-raw,channels=1,rate=16000 ! filesink location=/dev/stdout | livecaption

In order to discover your recording device you may use the gst-device-monitor-1.0 command line tool. For example:

$ gst-device-monitor-1.0
Probing devices...


Device found:

	name  : Built-in Output
	class : Audio/Sink
	caps  : audio/x-raw, format=(string)F32LE, layout=(string)interleaved, rate=(int)44100, channels=(int)2, channel-mask=(bitmask)0x0000000000000003;
	        audio/x-raw, format=(string){ S8, U8, S16LE, S16BE, U16LE, U16BE, S24_32LE, S24_32BE, U24_32LE, U24_32BE, S32LE, S32BE, U32LE, U32BE, S24LE, S24BE, U24LE, U24BE, S20LE, S20BE, U20LE, U20BE, S18LE, S18BE, U18LE, U18BE, F32LE, F32BE, F64LE, F64BE }, layout=(string)interleaved, rate=(int)[ 1, 2147483647 ], channels=(int)2, channel-mask=(bitmask)0x0000000000000003;
	        audio/x-raw, format=(string){ S8, U8, S16LE, S16BE, U16LE, U16BE, S24_32LE, S24_32BE, U24_32LE, U24_32BE, S32LE, S32BE, U32LE, U32BE, S24LE, S24BE, U24LE, U24BE, S20LE, S20BE, U20LE, U20BE, S18LE, S18BE, U18LE, U18BE, F32LE, F32BE, F64LE, F64BE }, layout=(string)interleaved, rate=(int)[ 1, 2147483647 ], channels=(int)1;
	gst-launch-1.0 ... ! osxaudiosink device=46


Device found:

	name  : Built-in Microph
	class : Audio/Source
	caps  : audio/x-raw, format=(string)F32LE, layout=(string)interleaved, rate=(int)44100, channels=(int)2, channel-mask=(bitmask)0x0000000000000003;
	        audio/x-raw, format=(string){ S8, U8, S16LE, S16BE, U16LE, U16BE, S24_32LE, S24_32BE, U24_32LE, U24_32BE, S32LE, S32BE, U32LE, U32BE, S24LE, S24BE, U24LE, U24BE, S20LE, S20BE, U20LE, U20BE, S18LE, S18BE, U18LE, U18BE, F32LE, F32BE, F64LE, F64BE }, layout=(string)interleaved, rate=(int)44100, channels=(int)2, channel-mask=(bitmask)0x0000000000000003;
	        audio/x-raw, format=(string){ S8, U8, S16LE, S16BE, U16LE, U16BE, S24_32LE, S24_32BE, U24_32LE, U24_32BE, S32LE, S32BE, U32LE, U32BE, S24LE, S24BE, U24LE, U24BE, S20LE, S20BE, U20LE, U20BE, S18LE, S18BE, U18LE, U18BE, F32LE, F32BE, F64LE, F64BE }, layout=(string)interleaved, rate=(int)44100, channels=(int)1;
	gst-launch-1.0 osxaudiosrc device=39 ! ...

In the above example the recording device (Built-In Microphone) is osxaudiosrc device=39, so in order to run the example you would need to adapt the command-line accordingly:

gst-launch-1.0 -v osxaudiosrc device=39 ! audioconvert ! audioresample ! audio/x-raw,channels=1,rate=16000 ! filesink location=/dev/stdout | livecaption

Content Limits

The Speech API contains the following limits on the size of content (and are subject to change):

Content Limit	Audio Length
Synchronous Requests	~1 Minute
Asynchronous Requests	~180 Minutes
Streaming Requests	~1 Minute

Please note that each StreamingRecognize session is considered a single request even though it includes multiple frames of StreamingRecognizeRequest audio within the stream.

For more information, please refer to https://cloud.google.com/speech/limits#content.

Documentation ¶

Overview ¶

Command livecaption pipes the stdin audio data to Google Speech API and outputs the transcript.

As an example, gst-launch can be used to capture the mic input:

$ gst-launch-1.0 -v pulsesrc ! audioconvert ! audioresample ! audio/x-raw,channels=1,rate=16000 ! filesink location=/dev/stdout | livecaption

Source Files ¶

View all Source files

livecaption.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL