See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/325261216
Sound Spatialization
Chapter · May 2018
DOI: 10.1007/978-3-319-08234-9_250-1
CITATIONS READS
0 2,298
1 author:
Michele Geronazzo
University of Padova
92 PUBLICATIONS 942 CITATIONS
SEE PROFILE
All content following this page was uploaded by Michele Geronazzo on 17 August 2018.
The user has requested enhancement of the downloaded file.
S
Sound Spatialization between accuracy and perceptual plausibility is
the key aspect while considering always evolving
Michele Geronazzo hardware solutions for such complex simulations.
Department of Architecture, Design, and Media
Technology, Aalborg University, København,
Denmark
Introduction
A high fidelity but efficient sound simulation is an
Synonyms
essential element in immersive virtual reality
(VR). Continuous advances in hardware and soft-
Auralization; Binuaral headphone reproduction;
ware technologies fosters interaction between vir-
Loud-speaker reproduction; Room acoustics;
tual sounds and humans in rendering experiences
Room response equalization; Sound
with increasing level of realism and presence.
spatialization; Spatial room acoustic; Spatial
From the literature, one can refer to the general
room impulse response; Virtual acoustics
term auralization that covers the three main com-
ponents for sound spatialization (Savioja et al.
1999): source modeling (see entry “▶ Virtual
Definitions
Reality: Sonic Interactions”), room acoustics
modeling, and receiver modeling (see entry
The general term sound spatialization refers to a
“▶ Virtual Reality: User Acoustics”).
reproduction system and their algorithms that
A schematic view of those aspects can be seen in
combined together provide a real-time and inter-
Fig. 1.
active rendering or auralizaion for an immersive
It has to be noted that many algorithms for
sonic experience in virtual reality scenarios. Four
virtual acoustics are computer vision and graphics
main aspects can be considered: (i) physics-based
rendering techniques (see entry “▶ Computer
simulation of sound sources, (ii) their propagation
Vision”) that are adapted to take into account for
in space together with (iii) binaural rendering to
sound generation and propagation. Some exam-
simulate user acoustics in the listening space or
ples are texture mapping/synthesis (Takala and
sound-field en-coding/decoding for multiple
Hahn 1992), beam tracing (Funkhouser et al.
loudspeaker reproduction, and finally (iv) the
1998), and rigid body simulations Van Den Doel
technological mediation for listening through
et al. (2001), allowing the synthesis of realistic
headphones/loudspeakers and for interaction
sonic interaction of objects in motion with a
trough motion capture. Finding the trade-off
# Springer International Publishing AG, part of Springer Nature 2018
N. Lee (ed.), Encyclopedia of Computer Graphics and Games,
https://doi.org/10.1007/978-3-319-08234-9_250-1
2 Sound Spatialization
Sound Spatialization,
Fig. 1 High-level acoustic Binaural Room Impulse Response (BRIR)
components for VR
auralization with focus on
Spatial Room Impulse Head-related Impulse
spatial room acoustics and
headphone reproduction Response (SRIR) Response (HRIR)
receiver #2
receiver #1
peri-personal
pinna bone
source space
concha ear canal
peri-personal
space
receiver #2
bone
receiver #1
source
Room Acoustics Headphones Listener’s body
Headphone Impulse
Response (HpIR)
geometric relationship in space (see entry “▶ Vir- following, a brief overview of both approaches is
tual Reality: Sonic Interactions”). provided.
On the other hand, a perceptually plausible and
efficient auralization is forced to be preferred to an Headphone Reproduction
authentic rendering due to the limitation in mem- One of the main advantage of headphone-based
ory and computational power related to low reproduction is the complete control of sound
latency constraints. This trade-off is complex synthesis and binaural cues arriving at each ear
and challenging because, especially in VR, real- (left and right channels); high level of isolation
time constraints involve a multimodal system, from external and noisy environmental sounds,
thus requiring resources shared with graphics, e.g., echoes and reverberation, in listeners’ can
sensors, application logic, and high-level func- be obtained by ear occlusion and noise-canceling
tionality (e.g., artificial intelligence). technologies.
However, headphones may be experienced as
intrusive by the user at the expense of naturalness
Reproduction and externalization of the listening experience,
causing the so-called in-head localization where
Immersive sound spatialization can be achieved stimuli are heard inside the head, thus interfering
through a speaker setup or headphones setup, with with the natural outside-the-head localization of
technical differences in the encoding phase and surrounding space (Shinn-Cunningham and
radical differences in perception for listeners. Shilling 2002). Headphone-induced spectral col-
Entry “▶ Virtual Reality: Headphones” discusses oration can be reduced by carefully following
headphones sound playback that is the hardware product design criteria and ad-hoc equalization
solution available and easily integrated with head algorithms with the aim of minimizing artifacts
mounted displays (HMD) at customer level. In the in the auralization (see entry “▶ Virtual Reality:
Headphone Acoustics” for further details).
Sound Spatialization 3
Auralization with headphones requires head- and reproduction algorithms in order to guarantee
related transfer function (HRTF) encoding and a satisfactory perceived audio quality.
interpolation (see entry “▶ Virtual Reality: User A critical aspect of loudspeaker reproductions
Acoustics” for further details) in a suitable func- lies in the acoustical properties of the playback
tional basis, like in the spherical harmonic environment that require room response equaliza-
(SH) domain. A HRTF spatial grid of 4 5 spacing tion (RRE) algorithms able to remove echos and
in both azimuth and elevation with decreasing spectral distortions in the virtual sound field.
density above the head leads to a perceptually Knowledge of single- or multiple-points of the
optimal representation for HRTF filters that room impulse response (RIR) inside the listening
could be convolved with partitioned block algo- area is necessary to determine the most appropri-
rithm providing a compromise between computa- ate inverse filtering design able to compensate
tional efficiency and latency (Välimäki reflections and resonances of the environments
et al. 2016). (see Bharitkar and Kyriakakis (2006) for a review
in this topic). Adaptive equalization is thus
performed by optimizing signal processing
Loudspeaker Reproduction
approaches able to take into account the time-
In a loudspeaker reproduction, the rendered
varying nature of the system, e.g., least-square
sound-field is completely extra-aural, thus natu-
solution, frequency warping, and multiple-input/
rally incorporating user acoustics in the so-called
multiple-output (MIMO) inverse theorem
sweet-spot, i.e., the optimal listening area, which
methods, to name but a few.
varies in size among techniques. Two main
approaches can be considered (see Spors et al.
2013) for a detailed review on this topic):
Auralization
1. Panning with level differences and/or time
delays The common and simple approach to provide an
2. Sound-field synthesis approximation of an acoustical space is by using a
static RIR which could be convolved with an
The most common example of the first original dry signal; unfortunately, this method
approach is the two-channel stereophony where lacks of flexibility and it is inappropriate for
the so-called panning law determines level/time VR. Rendering sound propagation for VR
differences in two loud-speakers in order to render requires the spatialization of the directional RIR
virtual sources. The works by Pulkki formalize the in the spatial room impulse response (SRIR).
vector-based amplitude panning in a multiple Interactive auralization forces algorithms to
loudspeaker setup (Pulkki et al. 2018). cover most of the psychoacoustic effects in local-
Sound-field synthesis employs a large number ization and timbre distortion due to dynamic
of loudspeakers to synthesize a virtual sound field changes of the active listening experience, thus
within a listening area. These multiple channels defining memory and computational requirements
are usually arranged at ear level as linear, circular, (see Fig. 2 for a schematic block diagram). Given
or spherical arrays defining the distribution of an encoding representation of the sound field,
monopole sources used to compute the virtual interactive VR latency requires the computation
secondary sources and the pressure field at an of SRIR in a convenient way: for instance,
open area surface with a desired spatial sampling, Schissler et al. (2016) performed the convolution
i.e., aliasing frequency. Examples of such in the spherical harmonics domain for HRTFs,
approach are wave-field synthesis, higher order sharing similar aspects to technologies for multi-
ambisonics and their derivatives. It is relevant to channel surround system such as high-order
note that human perception plays a critical role ambisonics, wave-field synthesis, and directional
also in the design of the loudspeaker configuration audio coding.
4 Sound Spatialization
VR scene
- materials Sound Headphones
real
- 3D geometrical models
- air absorption
propagation characterization
noise Room
reduction
clustering Source(s) sound Headphone
Complexity paths equalization
perceptual Artificial
metrics reduction reverberation
Listener
diffraction
Characterization
components
audio
Source Convolution / Delay line +
time-varying filters
HRTF HRTF Loudspeaker
interpolation equalization
filters
room
virtual Loudspeakers
equalization
Environment characterization
Sound Spatialization, Fig. 2 Block diagram of a typical system for sound spatialization and auralization. Both
headphone and loudspeaker reproduction are displayed in the same diagram
Sound propagation simulates the acoustics of • Geometric methods, involving a high fre-
the space, either a closed space such as a room or quency approximation of sound propagating
an open space surrounding the listeners in the form of rays
(a complete survey in interactive virtual environ- • Wave-based methods, solving underlying
ments is provided in (Välimäki et al. 2016). One physical equations
of the main challenges is the accurate modeling of • Hybrid methods, mixture of approaches
sound propagation which is computationally
intensive due to the multitude of reverberation Geometric Methods
paths from virtual sound sources to listener ears/ Geometric techniques precompute spatial subdi-
listening area. vision and beam tree data structures in order to
Perceptually motivated algorithms provide the provide real-time acoustic auralization of static
control of computational resources, i.e., CPU and sound sources. Modern algorithms support mov-
GPU processing, for SRIR computation, allowing ing sources relying on an efficient ray-tracing
a flexible scaling of aural complexity. For this which is shared with computer graphics
reason, algorithms for dynamic spatialization of (CG) (Cao et al. 2016); finding intersections
room acoustics allow immersion and externaliza- between rays and geometric primitives in a 3D
tion of a multiple sound source scenarios with space allows the optimization of supporting data
configurable geometric complexity and structures in computational cost and memory for a
implemented acoustic effects (i.e., order of reflec- perceptually coherent auralization.
tions, diffraction, and scattering). Diffracted
occlusion and sound propagation of early reflec- Wave-Based Methods
tions should be coherently rendered in order to Wave-based methods are computationally expen-
implement a perceptually plausible auralization. sive, requiring efficient solutions for real-time VR
Sound propagation modeling can be classified rendering. Savioja (2010) proposed a wave based
in three main approaches: simulation, combining finite difference methods
and computation on the GPU. Mehra et al. (2015)
described an interactive sound propagation
Sound Spatialization 5
system based on equivalent source method (ESM) Cross-References
for realistic outdoor sounds. This method extends
previous researches where pressure fields are pre- ▶ Audio: Overview of Virtual Ambisonic
computing based on elementary spherical har- Systems
monic (SH) description of the wave-based sound ▶ Virtual Reality: Immersive Auralization Using
propagation in the frequency domain; high- Headphones
dimensional acoustic field is compressing and ▶ Virtual Reality: Sonic Interaction in Virtual
allocated in memory for dynamic changing of Environments
sources/listener position and source directivity. ▶ Virtual Reality: Spatial Perception in Virtual
Environments
▶ Virtual Reality: Spatial Skill Training with Vir-
Hybrid Methods
tual Reality/Augmented Reality
Many hybrid solutions combining wave-based
▶ Virtual Reality: User Acoustics with Head-
and geometric techniques exist. A valuable exam-
Related Transfer Functions
ple is the work by Yeh et al. (2013) which pro-
▶ Virtual Reality: Virtual Reality: Presence and
posed the precomputing of the pressure field in the
Immersion
near-object regions with numerical wave-based
techniques and the modeling of sound propaga-
tion in the far-field with geometric propagation
techniques. References
Bharitkar, S., Kyriakakis, C.: Immersive Audio Signal
Processing. Springer, New York (2006). OCLC:
Conclusions 873570893
Cao, C., Ren, Z., Schissler, C., Manocha, D., Zhou, K.:
Interactive sound propagation with bidirectional path
Culling of inaudible reflections, limiting, cluster- tracing. ACM Trans. Graph. 35(6), 1–11 (2016).
ing, and projecting reflections in the egocentric https://doi.org/10.1145/2980179.2982431. URL
listener frame are required in order to increase the http://dl.acm.org/citation.cfm?doid=2980179.
perceptual plausibility of the sound spatialization 2982431
Funkhouser, T., Carlbom, I., Elko, G., Pingali, G., Sondhi,
while keeping under control memory and compu- M., West, J.: A beam tracing approach to acoustic
tational cost (Hacihabiboglu et al. 2017). modeling for interactive virtual environments. In: Pro-
The encoding of a virtual sound field require ceedings of the 25th Annual Conference on Computer
the parametrization of the perceptual fields in Graphics and Interactive Techniques, pp. 21–32. ACM,
New York (1998)
descriptors such as propagation delays, loudness, Hacihabiboglu, H., De Sena, E., Cvetkovic, Z., Johnston,
and decay times, ready to be quantized and com- J., Smith III, J.O.: Perceptual spatial audio recording,
pressed; the final aim of auralization techniques simulation, and rendering: an overview of spatial-audio
should be to find practical solutions on large, techniques based on psychoacoustics. IEEE Signal Pro-
cess. Mag. 34(3), 36–54 (2017). https://doi.org/
complex interactive scenes with millions of poly- 10.1109/MSP.2017.2666081. URL http://ieeexplore.
gons (Raghuvanshi and Snyder 2014). ieee.org/document/7911385/
Finally, it has to be noticed that auralization of Mehra, R., Rungta, A., Golas, A., Lin, M., Manocha, D.:
virtual outdoor environments is a challenging Wave: interactive wave-based sound propagation for
virtual environments. IEEE Trans. Vis. Comput.
issue. Efficient algorithms rely on digital wave- Graph. 21(4), 434–442 (2015)
guide web connecting scattering junctions at Pulkki, V., Delikaris-Manias, S., Politis, A. (eds.): Para-
nodes that represent discrete reflection points in metric Time-Frequency Domain Spatial Audio. Wiley,
the environment (Stevens et al. 2017). These algo- Hoboken (2018). OCLC: 1015335029
Raghuvanshi, N., Snyder, J.: Parametric wave field coding
rithms extend the scattering delay network for precomputed sound propagation. ACM Trans.
approach which was developed with particular Graph. 33(4), 1–11 (2014). https://doi.org/10.1145/
attention to computer games (Hacihabiboglu 2601097.2601184. URL http://dl.acm.org/citation.
et al. 2017). cfm?doid=2601097.2601184
6 Sound Spatialization
Savioja, L.: Real-time 3D finite-difference time-domain using the waveguide web. IEEE/ACM Trans. Audio
simulation of low-and mid-frequency room acoustics. Speech Lang. Process. 25, 1566 (2017)
In: 13th International Conference on Digital Audio Takala, T., Hahn, J.: Sound rendering. In: ACM
Effects, vol. 1, p. 75. (2010) SIGGRAPH Computer Graphics, vol.
Savioja, L., Huopaniemi, J., Lokki, T., Väänänen, R.: Cre- 26, pp. 211–220. ACM, New York (1992)
ating interactive virtual acous-tic environments. Välimäki, V., Parker, J., Savioja, L., Smith, J.O., Abel, J.:
J. Audio Eng. Soc. 47(9), 675–705 (1999) More than 50 years of artificial reverberation. In: Audio
Schissler, C., Nicholls, A., Mehra, R.: Efficient HRTF- Engineering Society Conference: 60th International
based spatial audio for area and volumetric sources. Conference: DREAMS (Dereverberation and Rever-
IEEE Trans. Vis. Comput. Graph. 22(4), 1356–1366 beration of Audio, Music, and Speech). Audio Engi-
(2016) neering Society (2016)
Shinn-Cunningham, B., Shilling, R.: Virtual Auditory Dis- Van Den Doel, K., Kry, P.G., Pai, D.K.: Foleyautomatic:
plays, pp. 65–92. Lawrence Erlbaum Associates Pub- physically-based sound effects for interactive simula-
lishers, Mahwah (2002) tion and animation. In: Proceedings of the 28th Annual
Spors, S., Wierstorf, H., Raake, A., Melchior, F., Frank, conference on Computer Graphics and Interactive
M., Zotter, F.: Spatial sound with loudspeakers and its Techniques, pp. 537–544. ACM, New York (2001)
perception: a review of the current state. Proc. IEEE. Yeh, H., Mehra, R., Ren, Z., Antani, L., Manocha, D., Lin,
101(9), 1920–1938 (2013). https://doi.org/10.1109/ M.: Wave-ray coupling for interactive sound propaga-
JPROC.2013.2264784 tion in large complex scenes. ACM Trans. Graph.
Stevens, F., Murphy, D.T., Savioja, L., Välimäki, V.: 32(6), 165 (2013)
Modeling sparsely reflecting out-door acoustic scenes
View publication stats