Up until this point I have mentioned HRTFs but not have not given a thorough explanation of what they are and why we use them.
When we hear sound, whether it be the songs of birds in the trees overhead or the voice of somebody stood next to us, we are naturally able to determine the location of the source (at least to some degree). There are 3 main processes which enable this:
- Interaural Time Difference (ITD)
- Interaural Level Difference (ILD)
- Pinna Cues
As sound reaches our ears, depending on the source location, a small delay occurs (ITD). The maximum delay, at 90 degrees to the listener, is around 0.4ms (derived from an approximated head width of 15cm and the speed of sound value of 340m/s), which is of course a minute delay, however thanks to the precision of the auditory system, we are able to make use of it to help locate sounds. (diagram below shows visual representation of ITD).
In addition to this, sounds from various locations can reach each ear with different amplitudes (ILD). Take, for example, somebody whispering directly into your ear.
Finally, and very importantly, we have pinna cues. Our ears are as unique to us as our fingerprint and as sound hits these big satellite dishes on the sides of our heads, it is reflected in towards our eardrums in a way that subtly filters the sound. Because our ears are shaped the way they are, and not perfectly round funnels, the sound is filtered differently as it arrives from different angles, which we subconsciously link to localisation.
By combining these 3 processes, we are able to precisely locate a sound, even if we have no visual cues or context. (The sound of a helicopter usually comes from above us, so when we hear helicopters, we expect them to be above us).
In the virtual environment, we may want to provide spatial information, for example in a first-person-shooter, where being able to precisely hear where an enemy is by the sounds of their footsteps would be an advantage and binaural audio enables us to do this better than conventional stereo. In this case, to achieve the level of realism requires simulating the three cues, which is where HRTFs come in.
A set of HRTFs are a bank of filters which can be used to combine with a sound to give the illusion of it occurring from any particular direction, outside of the user’s head. They convey the same information that we process in the real world, which means if done right, they can potentially enable the user to localise sounds as well as in the real world.
Most of the HRTFs that are used are generic, which means they are not specific to the user’s ears, however they still work because ITD and ILD are still well approximated, as are the pinna cues, to an extent. By using individualised HRTFs, however, a closer approximation to the real ITD, ILD and pinna cues are possible.
Next time, we will look into Ambisonics and how HRTFs are put to use.