auditory

package module

v1.4.0 Latest Latest Go to latest Published: Apr 13, 2022 License: BSD-3-Clause Imports: 0 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/emer/auditory

Links

Open Source Insights

README ¶

auditory

Auditory is the our repository for audition processing code in Go (golang) focused on filtering speech wav files via mel filters. A further step using gabors provides filtering for input to neural networks. The processing code is split into 4 packages, sound, mel, dft and agabor, that can be used independently. Example code is in examples/processspeech.

Packages

dft

The 'dft' package does a fourier transform and computes the power spectrum on the sound samples passed in.

mel

The 'mel' package creates a set of mel filter banks and applies them to the power data to create a spectrogram.

agabor

The 'agabor' package produces an edge detector that detects oriented contrast transitions between light and dark which can be convolved with the output of the mel processing.
There are 2 structs, FilterSet and Filter. You must create a FilterSet even if you are only adding one gabor Filter

sound

sound.go contains code for loading a wav file into a buffer and then converting to a floating point tensor. There are functions for trimming and padding.
sndenv.go is a higher level api that has code to process a sound in segments calling the sound code, mel code and gabor code
playwav.go can be called to play a wav file

speech

speech package has structs for Sequence and Unit
packages for specific sound sets (corpora) include code to load these sound files with timing information and lookup code.
- Package timit Phones of the TIMIT database. See Speaker-Independent Phone Recognition Using Hidden Markov Models, Kai-Fu Lee and Hsiao-Wuen Hon in IEEE Transactions on Acoustics, Speech and Signal Processing, Vol 37, 1989
- Package grafestes contains the consonant vowel names and timing information for the sound sequences used for the research reported in "Listening Through Voices: Infant Statistical Word Segmentation Across Multiple Speakers", Katherine Graf Estes & Lew-Williams, 2015.
- Package synthcvs contains consonant vowel names and timing information for the synthesized speech generated with gnuspeech. These sounds are similar to the ones used by Saffran, Aslin & Newport, "Statistical Learning by 8-Month-Old Infants", 1996

Documentation ¶

Index ¶

Constants

Constants ¶

View Source

const (
	Version     = "v0.9.8"
	GitCommit   = "9eef250"          // the commit JUST BEFORE the release
	VersionDate = "2021-10-22 09:54" // UTC
)

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

This section is empty.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
agabor
dft
examples
gaborview
play
processspeech
mel
sound
speech
grafestes
synthcvs
timit Package timit Phones of the TIMIT database.	Package timit Phones of the TIMIT database.
vowels

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL