auditory

package module

v1.8.1 Latest Latest Go to latest Published: Jun 26, 2022 License: BSD-3-Clause Imports: 0 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/emer/auditory

Links

Open Source Insights

README ¶

auditory

Auditory is the our repository for audition processing code in Go (golang) focused on filtering speech wav files via mel filters. A further step using gabors provides filtering for input to neural networks. The processing code is split into 4 packages, sound, mel, dft and agabor, that can be used independently. Example code is in examples/processspeech.

Packages

dft

The 'dft' package does a fourier transform and computes the power spectrum on the sound samples passed in.

mel

The 'mel' package creates a set of mel filter banks and applies them to the power data to create a spectrogram.

agabor

The 'agabor' package produces an edge detector that detects oriented contrast transitions between light and dark which can be convolved with the output of the mel processing.
There are 2 structs, FilterSet and Filter. You must create a FilterSet even if you are only adding one gabor Filter

sound

sound.go contains code for loading a wav file into a buffer and then converting to a floating point tensor. There are functions for trimming and padding.
sndenv.go is a higher level api that has code to process a sound in segments calling the sound code, mel code and gabor code
playwav.go can be called to play a wav file

speech

speech package has structs for Sequence and Unit
packages for specific sound sets (corpora) include code to load these sound files with timing information and lookup code.
- Package timit Phones of the TIMIT database. See Speaker-Independent Phone Recognition Using Hidden Markov Models, Kai-Fu Lee and Hsiao-Wuen Hon in IEEE Transactions on Acoustics, Speech and Signal Processing, Vol 37, 1989
- Package grafestes contains the consonant vowel names and timing information for the sound sequences used for the research reported in "Listening Through Voices: Infant Statistical Word Segmentation Across Multiple Speakers", Katherine Graf Estes & Lew-Williams, 2015.
- Package synthcvs contains consonant vowel names and timing information for the synthesized speech generated with gnuspeech. These sounds are similar to the ones used by Saffran, Aslin & Newport, "Statistical Learning by 8-Month-Old Infants", 1996

Documentation ¶

Index ¶

Constants

Constants ¶

View Source

const (
	Version     = "v0.9.8"
	GitCommit   = "9eef250"          // the commit JUST BEFORE the release
	VersionDate = "2021-10-22 09:54" // UTC
)

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

This section is empty.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
agabor
dft
examples
gaborview
play
processspeech
mel
sound
speech
grafestes
synthcvs
timit Package timit Phones of the TIMIT database.	Package timit Phones of the TIMIT database.
vowels

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL