Update No.4

It’s been a few weeks since my last post so it’s time for another update!

Following my test impulse responses I recorded the first HRIRs in the speaker rig with the KU100 dummy head. This is essentially the same process as taking the impulse response measurements from earlier on, except with a stereo dummy head microphone, lined up using the laser guidance system.

Importing these HRIRs into the Ambix VSTs in Reaper enabled me to recreate the soundfield and pan anechoic sounds around in 3D. This has proved that the method is sound and we can now move on to the next stage.

I used Ambix’s binaural decoder, which requires a config file to determine elements such as speaker positions, Ambisonics order and the directory containing the IRs. There was an existing config file for the speaker array that I am using, which I was able to take and amend to feature my HRIRs.

The next stage:

Before I am able to record real, human HRTFs I have one obstacle to overcome, which is microphones.

There is currently one remaining, working pair of in-ear microphones on site, which means I can either go ahead with the listening tests with no backups, or I can assemble some fresh ones. I plan on the latter option.

All of the necessary bits are here, they just need putting together, which will involve a lot of micro-soldering and a large dose of patience. The next post will likely be all about this job.

What else?

The listening test is almost complete, which has been made on Max 7. I have decided against using the 3D mouse in my test now and have instead opted for a custom-built controller that features two wheels to adjust azimuth and elevation angles, which should be a little more intuitive to use.

I also will need to create a ‘warm up’ patch which serves two purposes:

1) Help the user learn the controls and understand what they are supposed to do in the test.

2) Kill time, as a few minutes are required to process the recorded HRTFs before the test can begin.

 

Advertisements

Update No.3

I have taken my HRTF routine and used it to record my first test impulse responses. This is essentially the same method for capturing HRTFs, except it was done with a single omnidirectional microphone placed in the speaker array (instead of a person).

I set the equipment up to play the full 50-channel sweeps, however only two sets of five sweeps, back-to-back and overlapped, were needed, and so recorded, for this test. In both cases five sweeps play out of separate speakers at different times, the only difference being that in the overlapped method, the sweeps are playing at the same time (actually staggered by half a second).

After processing the sweeps in MATLAB, I was left with two sets of five impulse responses, two of which I have pictured below. The two IRs are taken from different recordings of the same speaker.

Screen Shot 2018-06-01 at 16.03.15
If you looked and thought “thats the same image in both boxes” then clearly this was a success. Running them through MATLAB’s ‘corrcoef’ function they came out 99.78% correlated, which could be further improved by either driving the speakers harder (improve signal-noise-ratio), or lengthening the sweeps from five seconds to ten.

So moving on from here my next job is to take an actual set of HRTFs, which is basically repeating today’s work, with more stuff.

 

(The main equipment used was: Reaper (a DAW that excels at multi-input/output routing) to route each channel of a 50-channel .wav file to an independent speaker, an Earthworks M30 measurement microphone, which features an exceptionally flat frequency response and a Sound Devices MixPre-D preamp. Also a 50-Genelic-speaker-array).

Listening Tests

As mentioned previously, I will be conducting listening tests with the public, in an attempt to answer the question of whether individualised HRTFs improve localisation over generic ones, in a binaural environment.

The tests are a continuation of work undertaken at the University of York regarding a comparison of localisation performance in 1st, 3rd and 5th order Ambisonics, available here, and so will involve similar test conditions. The participant is played pink noise bursts over a virtual speaker array (via headphones) and uses a 3DConnexion ‘Space Mouse’ to pan a moveable, ‘active’ source with a stationary, ‘reference’ source to match the two. When the user is satisfied that they are both in the same position their position is logged and the process is repeated for new reference source.

gallery_01_iso_right.jpg(3DConnexion Space Mouse, which enables the user to pan left/right as well as up/down, giving complete freedom to move a point around the surface of a sphere)

The virtual speaker arrays will be randomised so participants won’t know whether they are using their own HRTFs or other generic ones, such as those from the KEMAR database. Furthermore, we will be randomising between 1st, 3rd and 5th order Ambisonics, to test the comparison of HRTFs more rigorously

The test will be conducted in the same space that the participants’ HRTFs were captured, in this case the speaker array at the University. By taking these measurements in a non-anechoic environment, we are not only capturing the head’s transfer function but also the room’s, as sound is reflected off of the various, albeit absorbent, surfaces. If done correctly, the user, whilst positioned in the rig and wearing headphones, should be unable to tell whether sounds are played out of the real loudspeakers or their headphones (with their own HRTFs loaded).

IMG_0869(Speaker array)

Additionally, the test will be performed under head-tracking conditions, meaning that the virtual speaker array will have to pan in real-time as the user turns their head.

After the tests are concluded the results should be able to give a clear indication of which type of HRTFs should be used for a particular order Ambisonics.

 

Update No.2

Following my last update, I have continued my work into the HRTF recording process and am now ready to begin my first test run, which will most likely involve using the KEMAR dummy in replacement of a human participant. The benefit to using KEMAR as a test subject is that it doesn’t move, and so provides a control to compare test results with later.

With KEMAR positioned at the exact centre of the speaker array, a series of 50 sine sweeps will be played: one from each speaker, starting at 0.5 second intervals. The resulting recording will be two channels of data (corresponding to each ear), which will be processed within MATLAB to yield 100 impulse responses (HRIRs) (one per speaker, per ear). These HRIRs (a different representation of HRTFs) will be used to create a virtual speaker array that will be produced over headphones. Provided that this is a success, the same method will be used to create HRTFs for all of the participants.

IMG_0870
(KEMAR dummy head positioned in the rig using lasers)

Once this test it completed and if the results are as expected, I will be able to begin planning the listening tests, which will be explained in more detail in their own post.

Understanding Ambisonics

Ambisonics is a surround sound technique that was developed in the 1970’s which enables full 3D soundfield reproduction, extending from Blumleim’s Mid-Side technique.

In this technique, four channels of information are used to convey all the information needed to produce the full soundfield over any number of loudspeakers. This differs from conventional stereo or surround sound, which work on a one-channel-per-speaker basis. The four channel signal, also known as B-format, relates to directionality and can be manipulated through a decoder, specific to your speaker setup, to recreate the desired soundfield.

In its simplest form, first order Ambisonics (FOA) include three figure-eight microphone patterns relating to left-right, up-down and foward-back pairs, plus an omnidirectional component. These four components can be summed to make virtual microphones with a any polar pattern, aimed at any direction, meaning the microphone pattern can essentially be chosen in post-production.

The spatial resolution in FOA is quite low, however we can add more components to the signal with a decoder that splits the signal up further using spherical harmonics, which leads to what we call higher order Ambisonics (HOA).

HOA enables more spatial information to be conveyed and requires more speakers to accurately represent the signal. Below, you can see the spherical harmonics from 0th to 5th order (each line represents the order 0-5). For example, 5th order ambisonics splits the signal into 36 components, therefore 36 loudspeakers are required to accurately reproduce the signal.

Spherical_Harmonics_deg5
(image: https://commons.wikimedia.org/wiki/File:Spherical_Harmonics_deg5.png)

 

One of the positives to this method is the source file can be played back over any loudspeaker array, and only a new decoder is needed to up- (or down) scale the audio, without losing any information. Furthermore, this technique can be transferred to the binaural setting by replacing the speakers during playback with HRTFs over headphones, meaning a virtual speaker array is created.

One of the main downsides to Ambisonics for loudspeaker playback is the need for sophisticated speaker arrays to fully take advantage of the technique. Like other surround sound techniques, there is also a ‘sweet spot,’ in which the audio is played back correctly inside, but not outside. Increasing the order improves the size of this sweet spot, however it comes at the expense of requiring additional speakers. This again is negated by using binaural based Ambisonics, as the listener is always within the sweet spot.

 

 

Localisation and the importance of HRTFs for virtual 3-d audio.

Up until this point I have mentioned HRTFs but not have not given a thorough explanation of what they are and why we use them.

When we hear sound, whether it be the songs of birds in the trees overhead or the voice of somebody stood next to us, we are naturally able to determine the location of the source (at least to some degree). There are 3 main processes which enable this:

  1. Interaural Time Difference (ITD)
  2. Interaural Level Difference (ILD)
  3. Pinna Cues

As sound reaches our ears, depending on the source location, a small delay occurs (ITD). The maximum delay, at 90 degrees to the listener, is around 0.4ms (derived from an approximated head width of 15cm and the speed of sound value of 340m/s), which is of course a minute delay, however thanks to the precision of the auditory system, we are able to make use of it to help locate sounds. (diagram below shows visual representation of ITD).Screen Shot 2018-05-25 at 16.12.32

In addition to this, sounds from various locations can reach each ear with different amplitudes (ILD). Take, for example, somebody whispering directly into your ear.

Finally, and very importantly, we have pinna cues. Our ears are as unique to us as our fingerprint and as sound hits these big satellite dishes on the sides of our heads, it is reflected in towards our eardrums in a way that subtly filters the sound. Because our ears are shaped the way they are, and not perfectly round funnels, the sound is filtered differently as it arrives from different angles, which we subconsciously link to localisation.

By combining these 3 processes, we are able to precisely locate a sound, even if we have no visual cues or context. (The sound of a helicopter usually comes from above us, so when we hear helicopters, we expect them to be above us).

In the virtual environment, we may want to provide spatial information, for example in a first-person-shooter, where being able to precisely hear where an enemy is by the sounds of their footsteps would be an advantage and binaural audio enables us to do this better than conventional stereo. In this case, to achieve the level of realism requires simulating the three cues, which is where HRTFs come in.

A set of HRTFs are a bank of filters which can be used to combine with a sound to give the illusion of it occurring from any particular direction, outside of the user’s head. They convey the same information that we process in the real world, which means if done right, they can potentially enable the user to localise sounds as well as in the real world.

Most of the HRTFs that are used are generic, which means they are not specific to the user’s ears, however they still work because ITD and ILD are still well approximated, as are the pinna cues, to an extent. By using individualised HRTFs, however, a closer approximation to the real ITD, ILD and pinna cues are possible.

Next time, we will look into Ambisonics and how HRTFs are put to use.

 

Update No. 1

In preparation for public listening tests, a number of tasks have to be completed.

My immediate goal is to shorten the HRTF capturing process. HRTFs are recored by sitting a participant in the centre of a multi-speaker array (in my case 50 speakers), with a small microphone placed inside each ear, playing a pure tone sweep from each individual speaker.

As mentioned in the previous post, the capturing process involves a participant sitting perfectly still while these sounds are played at them. My plan to reduce this time is to use an overlap method, in which the 50 sweeps are played over each other, rather than one at a time. At this point, I have simulated a small scale version of the test, in which two overlapping sweeps yielded the same result as two separate sweeps. Furthermore, I have produced the sound files that will be played over the listening rig to test.

The next two steps include writing the code to process the recorded sweeps and then splitting the audio into the 100 separate impulse responses (the data that makes up the HRTF bank : 50 per channel).

 

Introduction to My Practical Project

Term two is over and so is the taught syllabus of my Post-Graduate Degree. Now we begin our final projects. I have chosen:

Localisation in binaural based Ambisonics using individualised HRTFs.

I understand this title includes some technical terms so let me break it down…

Localisation: The human ability to determine the direction of a sound source.
Binaural Based Ambisonics:  (BBAs) Creating a virtual 3D speaker array over headphones that allows sounds to be placed around the listener (rather than conventional stereo techniques that offer left/right panning).
Individualised HRTFs: (Head Related Transfer Functions). A set of data relating to the shape of the head and ears of the individual listener. (Can be thought of as a filter our own heads apply to sound entering the ear).

A simple example of binaural audio involves recording a sound with two microphones placed in the ears or a dummy-head. When played back over headphones, the listener will (ideally) hear the audio as though they were in the environment.

The step up from that is to capture a set of HRTFs, which allows us to take a sound and apply a particular filter to it, which when played back through headphones will sound as though it is coming from a particular direction. A full set of HRTFs allows a sound to be placed anywhere around a virtual sphere around the listener.

The field of Ambisonics has been around since the 1970’s but, due to the recent rise in virtual-reality gaming and videos, is now more relevant than ever. Currently BBAs are used on several platforms from 360 videos on Facebook and YouTube to VR gaming, alongside headsets such as the Oculus Rift, which means there is a demand to improve the technology, such as improving localisation and availability.

With this project, I aim to determine whether using our own HRTFs (over generic ones) improves our ability to localise sounds in a BBA environment. My instinct tells me yes, however the capturing of a set of HRTF’s is currently difficult as it involves a human participant sitting very still for a period of time, meaning the resulting HRTFs can be of poor quality compared to generic ones, therefor one part of my research will involve creating a method that will shorten the HRTF capturing process.

My results will be determined through a series of listening tests in which participants will try to accurately determine the location of a sound played in a BBA environment, using their own HRTFs and generic HRTFs.

 

TL;DR
In a virtual 3D environment, does using a personalised set of filters, based off the shape of our own head, improve our ability to determine where a sound is coming from?

A Few Words Regarding PPP

Wait a minute what is PPP? PPP stands for Personal Professional Practitioner and is one of our Modules here at the University of York. The aim of this module is to prepare us for life after University where we might be entrepreneurs or otherwise self-employed and will need to market ourselves.

How is this achieved? We have attended lectures and had discussions about how to negotiate, how to come up with ideas and we had a guest lecturer who enlightened us with tales of being in an independent band. The module culminated with the schools’ event that we put on. We would be tested in multiple ways… We had a date set; the rest was up to us. Between the 13 of us we came up with our ideas and got to work with preparing them, splitting up into four teams. Between us we had a studio tour, build-a-microphone, learn to code, and of course build-a-loudspeaker.

I can say that everybody was challenged with this task. It was definitely challenging, in a good way, however. Speaking just on behalf of Team Electric Falcon! we were not 100% sure that our demonstration would work until the last minute. Thankfully it ran according to plan and in the end, we all had a good time. Events like this are important for children as it can be the difference between them realising they want to get into a particular career path, and losing all interest in education altogether.

But enough about the schools’ event.

This module reminds me of a module in my Undergraduate degree at Wolverhampton Uni. Instead of putting on a demonstration, we were given a more open choice. I was in a group of seven and we hosted a live lounge session in the University’s Black Box Theatre. Again, we were left to arrange everything, so that meant booking out the venue, booking four acts, sourcing and setting up the recording equipment, promoting the event and everything else. Being in a group of seven made this process more manageable. In the end, the event was a success and we all gained some valuable experience from it.

These kinds of experience do contribute to personal professional practice because in the real world you do have to organise things, and work to a schedule. In school (and university), a lot of the time you turn up and do the work. It’s the same at many jobs: turn up and get on with the job, but if you are to be self-employed you are required to be self-motivated as well, and deadlines are sometimes the best motivator. If I don’t hand an assignment in on time then I effect only myself, however if I sell tickets to an event and there are no artists or the sound system doesn’t work then I will ruin my reputation for the future. Therefore, it is important to prepare yourself for these upcoming challenges.

What have I learned from PPP? Well for starters I have gained a little bit of experience in public speaking, in the sense of talking to a class. I have had use my mind to problem solve in the context of getting the speakers to work and I have thought about negotiation and how to get the most out of a meeting.

Do I think there could be improvements to the module? Yes of course. PPP is a small part of the course so it’s not expected to be hugely in-depth. That said for a module that is preparing us for self-employment in the enormous and growing industry of electronic engineering/ audio technology, I feel like a guest lecture from a real entrepreneur would have been useful. Doesn’t have to be Mark Zuckerberg.

24271042_10207978929071271_1416651102_o

Here’s the class. (I’m the fourth one in)

Schools Event

Today we hosted the schools event. Around 30 pupils from local secondary schools came to visit and see the demonstrations we prepared for them and Team Electric Falcon’s “Build Your Own Speaker” was a huge success. The children seemed to really engage with the project and it was nice to see them expressing their creative sides when it came to tinkering with the speakers.

The group was split into two so we ran the demonstration twice. As planned we further split each group into sub-groups of three or four and gave each of them a kit containing a magnet, a coiled wire, some extra bits and a set of instructions. Tom introduced us, I gave a brief monolog on how a speaker basically works and we set them to the task.

At first the kids didn’t seem to fully understand what they were supposed to do, but after Tom, George and I helped them to get started, they became quite focused and enthusiastic. We had two laptops between us so we had to go around and test the groups’ speakers when they were ready, although this was no problem.

One of the things we hadn’t expected was how quickly some of them would assemble the components. We were testing speakers within five minutes of them getting started. This meant a lot of time was spent talking to the pupils and asking how they might be able to improve on them, to which we had some great responses. One group ended up with such a great speaker that it was louder than our demo!

Upon reflection, I think there were a couple of areas which we could improve upon if we were to repeat the demonstration. Firstly, I think it would have been helpful if the pupils could see us assemble a speaker so they could better visualise what to do. Even though it is simple when you know how, it is not immediately obvious what every part does and therefor it led to some confusion. This would probably be best achieved with a short video as we would not want to waste too much time at the start. Secondly, many of the pupils did not realise that we wanted them to experiment with speaker design. This could have been addressed in the introduction, and made clear that we had plenty of sheets of card and plastic cups for them to cut up and play with. Finally, we did not have a particular conclusion to our demonstration, so it would have been beneficial to take three or four minutes at the end to gather everyone around and take questions and feedback.

With that said, there are also positive things to take from the experience. Children can be a tough audience and I was slightly nervous when they walked in and I had to deliver a speech explaining how a speaker works, while trying to avoid using terms like “electromagnetic induction.” Public speaking is an aspect that I want to develop and doing events like this is a great way of improving. Additionally, I was pleased to see how enthusiastic some of the pupils really were. When I asked them how they might improve on the design I got some great answers, which showed that they were using the creative part of their brains. The part that says, “maybe if I take off the cone because it’s trapping the sound under it.” While this isn’t the case, it was great to see them thinking about it.

In conclusion, I believe the pupils took something from our demonstration and I hope that we inspired some of them to consider going down the Music Technology/ Electronic Engineering path.