Week 1 (1/7/19)

Data Decisions

Muiredach O’Riain

week: 1
Starting: 1st July 2019 – 1/7/19

My main plan for this week was deciding
Deciding Upon and Defining the Relevant Features to Classify
Finding relevant Impulse Responses
Finding a series of clean sounds to be convolved as a basis for the training data

The problem of finding suitable Impulse Responses solved thanks to the Aachen Impulse Response (AIR) Database, a huge and well documented collection of IR recordings made by RWTH Aachen University in Germany. The IRs are systematically labelled with a range of features that are suitable to be used in this project. They are stored as double-precision binary floating-point MAT-files which can be imported directly into MATLAB, however they can also be extracted in python using scipy.io.loadmat().

To get the dry audio sample sounds I went to https://www.pacdv.com/sounds, a website that hosts a huge amount of free to use non-copyrighted sound files for use in video, film, audio and multimedia productions.
I used GNU Wget, a computer program that retrieves content from web servers, to retrieve all the wav files automatically and store them on a Hard Disk Drive.

Since I had already gathered a lot of Data I began running some tests in python;

I Used scipy.signal.convolve to convolve two signals, the signals were convolved but when played back there was distortion

Using scipy.io.loadmat and wave libraries for python I managed to extract an IR from the .mat files as a .wav however the audio was heavily distorted, need to figure out the right settings.

Goals for next week

Storage & Labelling

  • labels based on the type of sounds, rooms they are being simulated in etc.
  • Creating an organised system for storing and accessing created data as well as the IRs and any other relevant data or coding projects.

Processing the samples

  • Either finding or beginning to design a suitable tool/plug-in that can convolve our clean samples with the IRs decided upon

Convolving the samples and generating our data

  • using the tool to process all of the sound files to simulate them in the spaces chosen

Week 10 (9/9/19)

Blog Post 8

Optimizing the NN

This week I began testing different configurations of Neural Nets with tensorboard to find out which gave the most promising results.

Previously , when I was training on the raw wav data, I found the best results with models that included 1 or more 1D Convolutional layers. Interestingly, when training on the cepstral data I got the best results using 3 dense layers of 128 nodes. An upside of this is that the model also trains far more quickly, taking less that half the time of a similar model that used convolutional layers.

During my testing phase I also found some interesting behaviour when I changed the loss function of the model from ‘sparse categorical crossentropy’ to “mean squared error”. A NN with 3 dense layers of 128 nodes went from getting to about 80% accuracy in around 100 epochs to a NN that managed to classify the validation set up to 100% accuracy in just 13 epochs, but without any significant change in the loss. I would like to look deeper into why this NN performed this way, and to make sure that it’s not caused by a bug in my code.


After settling on a model for my NN I decided to export my dataset as a CSV, so I wrote the necessary python code and exported my training data of cepstruns as two CSVs, one containing the dependant features (cepstral data) and the other of the independant classes (IR indexes).

I uploaded them to the landing zone of my cluster but ran into a problem when I attemted to spray them to the cluster itself where any attempt to spray the files resulted in getting a ‘SSH failed to connect’ error. After a few emails, it was determined that someone had been messing with the ssh configuration, which was promptly sorted out, allowing me to spray my files.

With the files sprayed, I began to write some ECL to determine the layout of my data sets, so that they could be used to train NNs on the cluster with the GNN bundle.

Currently I’m having some trouble with defining the stucture of the features since each cepstrum has 100 features, and each is stored in a sperate cell.

Currently I’m getting around this by changing the separator from ‘,’ to ‘/n/n/n/n’ which denotes the end of a row in a csv file, however I assume there is a better way do this.

Next week I plan to run a NN on the cluster using my data sets and the model from my research on tensorbaord.

Week 9 (2/9/19)

Blog Post 7

Data Issues:

Space & Cepstrums

This week came with a couple of setbacks and some pretty cool break throughs.

At the end of last week I had a pretty robust model that could acuratley classify a sound file on weather it had been recorded in one of two rooms.

After this I decided to up the ante by making a new model that would have not two, but 8 distinct rooms to classify between.
I got the model working on a few training samples but problem arose when I tried to train the model with more data, as the bigger database was actually too big to be loaded into RAM.
I fixed this by decreasing the number of files in the training database.
After a few test runs however it turned out that the model was far less accurate than the previous binarly classifier, and took a lot longer to train.

Cepstral Analysis:

After reading a few papers on music information retrival techniques I found that many projects utilise cepstral analysis to help corerelate audio simmilarity.

In DSP a cepstrum is ‘the result of taking the inverse Fourier transform of the logarithm of the estimated spectrum of a signal’ which sounds (and is) pretty complicated but it can kind of be thought of as a way to look at the rate of change of diffent spectrum bands.

By adding a prepreccessing step where I take the cepstrum of each sample in the traing set and generate a new data set using this information, I was able to increases the accuracy of the models classifcations and decreases the size of the database dramatically, saving huge ammounts of memory and training time.

A further step I might take is testing whether or not the auto-cepstrum (the cepstrum of the autocorrelation signal)
which is sometimes used in the analysis of singal data with echoes, produces more accurate results.

Next week I plan to tweak the model using tensorboard to analyse the sucess of slightly different models, and to begin working more with GNN to implement the model with ECL.

Week 8 (26/8/19)

Blog Post 6

More Data

This week was all about generating more data.

- In order to train a more complex neural net with multipule outputs I needed a database of sound files with simulated recordings for multiplue room. However, going back and looking through my data base of convolved wav files, a lot of the files for certain IRs were corruppted and couldnt be used at all. On top of this I also wanted more data points to train the NNs on, as cutrrently I had about only 500 samples for each IR. 

- I spent the better part of the last week looking for and downloading more samples with wget to rebuild and update my database. 

After convolving each of the clean samples I currently I have around an hours woth of training data, for each of the 8 selected IRs.

Data Management

- Currently, I am running some tests on my data using tensorflow and python in jupyter notebook, however due to the amount of data I am analysing, the size of the data base is currently too big to be stored directly in RAM and throwing an error. Im going to try a few fixes next week and hope to have a multi output NN that can be translated to ECL and the HPCC clusters.

Week 7 (19/8/19)

Blog Post 5

Week 7

Dataset management

This week I began using the tensorflow library with my data to generate some preliminary neural nets and test out which models performed best.

After looking at a plotted graph of my wav file data set I realised that the majority of useful information was contained in the first 4 seconds (or 4*44100 samples) of my Wav Files, so I wrote a quick bit of python to truncate all files to 4 seconds long.

Then, using the IR used to create the file as its classifier, I made a data set containing the 583 wav files from each of the IR_2 and IR_3 data sets to test my Neural Nets with.
Before adding the other 7 or so IRs into the mix I wanted to know whether or not a neural net could be trained to differentiate between two rooms.

The Power Of CNNs

This person does not exist. This image was generated using a Convolutional Neural network

Tensor Tests

After reading a few papers on the subject I found that a sound file can be thought of as a single dimensional temporal data set, as the data essentially represents the amplitude of a signal that evolves over time. Recently a lot of projects such as WaveNet and this Pydata project (https://www.youtube.com/watch?v=nMkqWxMjWzg) have found a lot of success working with this type of temporal data by utilising 1 dimensional convolutional neural networks.

I designed a simple model of a 1D Convolutional NN, based the ‘Best Network’ CNN developed by Nathan Janos and Jeff Roach in the previously linked video, 6 filters wide, 3 deep and a window of 7350 samples (since they were using a window of a 24 hour period with 8 weeks of data – I scaled it to be the same percentage of the 4*44100 samples I was working with) and trained it on my dataset for about 100 epochs.

After About 50 epochs the validation accuracy began to plateau at around 90% and the classification accuracy was at about 85%.

I think I will be able to increase the accuracy of this NN by feeding a larger data set as training data and tweaking the model. After testing this hypothesis my plan is to further alter the model so that it can facilitate more than 2 outputs, so that I will be able to identify even more types of rooms.

Furthermore, I’ve noticed that the majority of projects involving sound and ANNs convert the raw audio files into Mel frequency cepstrums before analysing them. I plan to see if this has any effect on the accuracy of the model compared to just training it on the raw Wav files.

Weeks 4/5/6

Due to a combination of things, I’ve recently had to put the project about a half ways on hold. As such I haven’t had a chance to update the blog recently, so this post will cover my progress over the last few weeks from 24/7/19 to the end of this week 9/8/19.

Permissions Granted / Setting Up

The most pressing matter of gaining admin rights for my work machine was solved fairly early on, after a few calls to the help desk and I was able to download and install the ECL IDE, HPCC Systems and the other software I required for this project. This Included the GNN bundle and the ML_Core Bundle for ECL IDE which provide the necesarry tools for Generalised neural nets and Machine learning tools in ECL.

My next goal was to find a cluster to work on, and after an email or two I was sent the IP for the ‘oxford cluster’ where I would be able to Spray my data and begin messing with it in the ECL IDE. But first I had to figure out the best wav to spray my wav files.


ECL works by accessing data from a cluster, however, firstly you have to actually put that data on the cluster. The most common way this is done is by uploading your data to the landing zone of an ECL Watch page and from there using the built in Spray function to move it to the cluser. When spraying data from the ECL Watch and there are many options to determine how its done, depending on whats best for the type of data you want to spray. As I am currently using WAV files, I went with a BLOB Spray which allows you to spray any kind of data as a BLOB (Binary Large OBject). Using this method I sprayed a traing data set ‘IR8’ of around 500 IR convolved Wavs to my cluster.

The Problem with BLOB Spraying is that BLOBs dont have a specific structure so you have to define the structre yourself in ECL to create a meaningful database from your data. I made a quick record structure for the IR8 database with 3 collunms for the file name, actual wav data and the virtual file position like so:

Layout_IR8 := RECORD
STRING filename;
DATA wavData;
UNSIGNED8 recordPos{virtual (fileposition)};

IR8 := DATASET(~mbo::db::training::ir8′, LAYOUT_IR8, THOR, COMPESSED);

At this point I ran a few tests to make sure it was all working, amd began to get to grips with the documentation for the GNN bundle, which I plan to start using with my data from next week.

Week 3 (15/7/19)

Blog Post 3

Muiredach O’Riain

week: 3
Starting: 15th July 2019 – 15/7/19


Embedded Tools & User Permissions

An exciting development for this project came early in the week when I was put in touch with Roger Dev, the leader of HPCC Systems Machine Learning Library. Roger and his team are currently developping a Generalized Neural Network (GNN) Bundle for use with HPCC systems, which will eventually be a distributed ECL interface to Tensorflow

Over the coming weeks I plan to utilise an Alpha version of this tool set in order to help construct and develop Neural Nets with my data using ECL rather than with embedded python code as was originally my plan. This could be a huge boost for this project in terms of productivity as it would allow me more flexibility when coding with ECL rather than swapping back and forth between ECL and Python. Using the Alpha also offers a unique opportunity to be one of the first people outside of the Dev team to test the GNN bundle and help contribute to its development.

Permissions & Setbacks-

This week has been a bit slower in terms of actaul development compared to the previous two. Due to an issue with admin rights on my work machine I have been unable to download the necessary software to begin working with ECL and HPCC systems.

Despite this setback I am still ahead of schedule having previously surpassed my predicted goals from my proposal. Furthermore the downtime I have had whilst working to resolve this issue has allowed me to brush up on the ECL language and surrounding documentation, which I feel will help greatly improve my programming in the coming weeks.

This week constitued a break from programming and allowed me some time to reflect on the direction of the project and my approaches to the problems it presents.

Next week my goals are :

Get ECL-IDE & HPCC Systems Running on work machine

Figure out the best data format to use and spray my data to a cluster

Get to grips with ECL and begin some tests with ECL on my data

Begin to look at the GNN Alpha starting with the provided Documentation

Week 2 (8/7/19)

week: 2

Troubleshooting & Batch Processing

Starting: 8th July 2019 – 8/7/19

Goal For this week was to finish extracting the .wav files from the .mat files and batch convolving code.

Distorted files

Emailed the people at Aachen University for the .wav file version of the IRs, not in the .mat files, however they were 24bit files which wavefile.read() doesn’t work with. (ref:https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.read.html)

I realised that the audio distortion problem from last week was only happening when the data was being exported as a .wav. This was fixed by changing the export function I was using to the scipy.io wavfile.write() function.

As such I managed to extract the IRs from the MAT Files as 2channel, 64bit, 48000hz .wav files. This also meant I was able to export the convolved files.

Batch Convolving and Exporting

Wrote a program that could batch convolve a folder of wav files with an IR.

Realised some files didn’t work with the convolution because they were stereo and the IRs were Mono, to solve I wrote a script to check the number of channels on each file and Convert them to mono if they were stereo.

Also Realised that when convolving some files could get too loud or quiet so I wrote a function to normalise all the output wav files between -0.8 & 0.8 so they don’t clip and distort.

Finished batch convolver program and gave it a naming function that would combine the IR and Name of the Dry Wav file.

Ran it over night and generated about 10GB of convolved and labeled files to begin training the first round of Neural Nets.

Plan for next week is to spray data to a cluster and begin processing the data with ECL

What’s the big idea?

The goal of this project is to use HPCC Systems technology to help build a reliable classification model that is able to accurately classify an input sound file to a description of its location. The project intends to not only demonstrate a proof of concept for this technology but also to lay the groundwork for what could be the next step in forensic audio analysis and a new means of gathering information through sound.

Impulse Responses

In digital signal processing there are things called Impulse Responses (IRs) that are often used to capture and recreate the audio characteristics of a certain space or piece of equipment. Simply put the audio characteristics of any room can be described as an equation or a system that takes an input sound and then transforms it. An IR is essentially the result of putting a very short burst of sound, or an impulse, into this system and recording the response. We can then take this IR and use it to transform other sounds, making it seem as if they were recorded in the same room.

Using this principle it is possible to create a huge amount of audio data from a few impulse responses and some dry samples which we can then use to train a neural net as a classifier.

Create your website at WordPress.com
Get started