voice-grabber

module
v0.0.0-...-913d617 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 19, 2018 License: GPL-3.0

README

voice-grabber

This repo is a collection of scripts to download the dataset necessary to train the jibjib-model

Repo layout

The complete list of JibJib repos is:

  • jibjib: Our Android app. Records sounds and looks fantastic.
  • deploy: Instructions to deploy the JibJib stack.
  • jibjib-model: Code for training the machine learning model for bird classification
  • jibjib-api: Main API to receive database requests & audio files.
  • jibjib-data: A MongoDB instance holding information about detectable birds.
  • jibjib-query: A thin Python Flask API that handles communication with the TensorFlow Serving instance.
  • gopeana: A API client for Europeana, written in Go.
  • voice-grabber: A collection of scripts to construct the dataset required for model training

Scripts

In the top level of this repo, there are several helper scripts to create/change JSON and CSV files, as well as converter.py to convert audio files from mp3 to wav.

data_grabber/

This Go script uses gopeana to populate both a JSON and CSV file with information about the on Europeana published bird voices from the Tierstimmenarchiv (open dataset of the Museum für Naturkunde Berlin)

file_grabber/

This Go script uses the output of data_grabber/ to follow the links provided on Europeana and download the audio files.

wiki_grabber/

This Python script takes input from a CSV file and uses the Wikipedia API to extract summaries about birds, then saves it in a seperate CSV.

xeno_grabber/

This is a collection of scripts to:

  • clean the files directory (in our case, in order to bring down the total number of classes, birds with a German Wikipedia entry were used.)
  • nicely crawl Xeno Canto for audio files of birds
  • download the audio files from Xeno Canto

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL