
One soundscape is a set of sound sources that each radiates an acoustic signal that propagates in the environment. After numerous reflections on the walls, modifications linked to the numerous objects present in the environment, some rays will reach both of our ears. It is from this sound “mixture” that we will recognize all these sound sources, but also represent the whole scene, mentally, just by using the sound modality.
To do this, humans use numerous elements, between the ear and the central nervous system. And each element has a particular role in processing the acoustic signal to extract all this information. What are the sources present? Where are they located? Are they mobile?
We can describe our auditory system in three elements, our ear, the auditory pathways, and the auditory cortex.

Our ear aims to capture acoustic information, translate it into an electrical signal, which can be analyzed by our nervous system, and then begin a first phase of processing, relatively simple but essential to extract information, time-frequency analysis. In fact, all the information transmitted on the auditory nerve will be organized from the most serious frequency to the most acute frequency, and separated in time to allow treatment facilitated by the nervous system. This is the “tonotopia” performed by the Cochlea located in the inner ear.
This information is then increased and modified as it travels through the auditory pathways to the central nervous system. Indeed, it is during this journey that the information coming from our two ears is combined to allow the location of the sound sources. Surprisingly, a message can also go the other way around to “adapt” how our hearing works, for example by directing our attention to a particular source such as a voice, but also by attenuating the influence of the environment, such as the reverberation of a room.
Finally, the auditory cortex will organize this flow of information to facilitate the extraction of information and thus make it possible to represent all this information in a coherent manner; this is what we call the analysis of auditory scenes. Simultaneous elements are used to separate sound sources while other elements are merged into auditory objects. These objects are then compared to patterns in our memory to facilitate the identification of these sound sources and then associated in sequence to extract meaning from this sound environment (sentence, melody, etc.). It is important to note that along the auditory pathways and at the level of the auditory cortex, there are links to other perceived modalities (in particular vision), facilitating the extraction of information from the auditory flow. Finally, the movements of the head, the location and then the orientation in the axis of the sound source also give a lot of information.

Automated sound event recognition methods, which are our specialty at Wavely, simply reproduce these three steps through a series of pre-processing algorithms, extraction of representative elements and then identification. However, the plasticity of the human brain, the possibility of modifying one's attention in real time, the bank of “patterns” acquired over the years are key elements in human functioning... and so many challenges to be met in order to reproduce this understanding in an automated way!
📧 contact@wavely.fr or by phone 📞 03.62.26.39.08