KEMBAR78
Research Article: Virtual Reality System With Integrated Sound Field Simulation and Reproduction | PDF | Virtual Reality | 3 D Computer Graphics
0% found this document useful (0 votes)
50 views19 pages

Research Article: Virtual Reality System With Integrated Sound Field Simulation and Reproduction

Spatial Decomposition Method for Room Acoustic Parameters using Micriphone Array

Uploaded by

Hamza Aslam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views19 pages

Research Article: Virtual Reality System With Integrated Sound Field Simulation and Reproduction

Spatial Decomposition Method for Room Acoustic Parameters using Micriphone Array

Uploaded by

Hamza Aslam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Hindawi Publishing Corporation

EURASIP Journal on Advances in Signal Processing


Volume 2007, Article ID 70540, 19 pages
doi:10.1155/2007/70540

Research Article
Virtual Reality System with Integrated Sound Field
Simulation and Reproduction

Tobias Lentz,1 Dirk Schröder,1 Michael Vorländer,1 and Ingo Assenmacher2


1 Institute of Technical Acoustics, RWTH Aachen University, Neustrasse 50, 52066 Aachen, Germany
2 Virtual Reality Group, RWTH Aachen University, Seffenter Weg 23, 52074 Aachen, Germany

Received 1 May 2006; Revised 2 January 2007; Accepted 3 January 2007

Recommended by Tapio Lokki

A real-time audio rendering system is introduced which combines a full room-specific simulation, dynamic crosstalk cancellation,
and multitrack binaural synthesis for virtual acoustical imaging. The system is applicable for any room shape (normal, long, flat,
coupled), independent of the a priori assumption of a diffuse sound field. This provides the possibility of simulating indoor or
outdoor spatially distributed, freely movable sources and a moving listener in virtual environments. In addition to that, near-to-
head sources can be simulated by using measured near-field HRTFs. The reproduction component consists of a headphone-free
reproduction by dynamic crosstalk cancellation. The focus of the project is mainly on the integration and interaction of all involved
subsystems. It is demonstrated that the system is capable of real-time room simulation and reproduction and, thus, can be used as
a reliable platform for further research on VR applications.

Copyright © 2007 Tobias Lentz et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. INTRODUCTION 1.1. Sound field modeling

Virtual reality (VR) is an environment generated in the com- The task of producing a realistic acoustic perception, local-
puter with which the user can operate and interact in real ization, and identification is a big challenge. In contrast to the
time. One characteristic of VR is a three-dimensional and visual representation, acoustics deal with a frequency range
multimodal interface between a computer and a human be- involving three orders of magnitude (20 Hz to 20 kHz and
ing. In the fields of science, engineering, and entertainment, wavelengths from about 20 m to 2 cm). Neither approxima-
these tools are well established in several applications. Visu- tions of small wavelengths nor large wavelengths can be as-
alization in VR is usually the technology of primary interest. sumed with general validity. Different physical laws, that is,
Acoustics in VR (auralization, sonification) is not present to diffraction at low frequencies, scattering at high frequencies,
same extent and is often just added as an effect and with- and specular reflections have to be applied to generate a phys-
out any plausible reference to the virtual scene. The method ically based sound field modeling. Hence, from the physical
of auralization with real-time performance can be integrated point of view (this means, not to mention the challenge of
into the technology of “virtual reality.” implementation), the question of modeling and simulation
The process of generating the cues for the respective of an exact virtual sound is by orders of magnitude more dif-
senses (3D image, 3D audio, etc.) is called “rendering.” Ap- ficult than the task to create visual images. This might be the
parently, simple scenes of interaction, for instance, when a reason for the delayed implementation of acoustic compo-
person is leaving a room and closes a door, require com- nents in virtual environments.
plex models of room acoustics and sound insulation. Oth- At present, personal computers are just capable of sim-
erwise, it is likely that coloration, loudness, and timbre of ulating plausible acoustical effects in real time. To reach
sound within and between the rooms are not sufficiently rep- this goal, numerous approximations will still have to be
resented. Another example is the interactive movement of a made. The ultimate aim for the resulting sound is not to
sounding object behind a barrier or inside an opening of a be physically absolutely correct, but perceptually plausible.
structure, so that the object is no longer visible but can be Knowledge about human sound perception is, therefore, a
heard by diffraction. very important prerequisite for evaluating auralized sounds.
2 EURASIP Journal on Advances in Signal Processing

Cognition of the environment itself, external events, and— VR application Room acoustics
very important—a feedback of one’s own actions are sup-
ported by the hearing event. Especially in VR environments, Image sources Ray tracing
Position
the user’s immersion into the computer-generated scenery is management Early specular Diffuse/
late specular
a very important aspect. In that sense, immersion can be de- reflections reflections
fined as addressing all human sensory subsystems in a natural
way. As recipients, humans evaluate the diverse characteris- Visualization
tics of the total sound segregated into the individual objects. Auralization server Reproduction
Furthermore, they evaluate the environment itself, its size, Filter processing, Crosstalk
low latency cancellation
and the mean absorption (state of furniture or fitting). In convolution
the case of an acoustic scene in a room, which is probably
typical for the majority of VR applications, a physically ade-
quate representation of all these subjective impressions must, Figure 1: System components.
therefore, be simulated, auralized, and reproduced. Plausibil-
ity can, however, only be defined for specific environments.
Therefore, a general approach of sound field modeling re-
quires a physical basis and applicability in a wide range of commercially available solutions, which have been realized
rooms, buildings, or outdoor environments. by dedicated hardware that can be used via a network inter-
face, for example, the Lake HURON machine [6]. Other ex-
amples of acoustic rendering components that are bound by
1.2. Reproduction a networked interface can be found in connection with the
DIVA project [7, 8] or Funkhouser’s beam tracing approach
The aural component additionally enforces the user’s im- [9]. Other approaches such as [2] or [10] have not been im-
mersive experience due to the comprehension of the envi- plemented as a networked client-server architecture but rely
ronment through a spatial representation [1, 2]. Besides the on a special hardware setup.
sound field modeling itself, an adequate reproduction of the The VirKopf system differs from these approaches in
signals is very important. The goal is to transport all spatial some respects. A major difference is the focus of the VirKopf
cues contained in the signal in an aurally correct way to the system, offering the possibility of a binaural sound experi-
ears of a listener. As mentioned above, coloration, loudness, ence for a moving listener without any need for headphones
and timbre are essential, but also the direction of a sound and in immersive VR environments. Secondly, it is not imple-
its reflections are required for an at least plausible scene rep- mented on top of any constrained hardware requirements
resentation. The directional information in a spatial signal is such as the presence of specific DSP technology for audio
very important to represent a room in its full complexity. In processing. The VirKopf system realizes a software-only ap-
addition, this is supported by a dynamically adapted binaural proach and can be used on off-the-shelf custom PC hard-
rendering which enables the listener to move and turn within ware. In addition to that, the system does not depend on
the generated virtual world. specially positioned loudspeakers or a large number of loud-
speakers. Four loudspeakers are sufficient to create a sur-
1.3. System rounding acoustic virtual environment for a single user using
the binaural approach.
In this contribution, we describe the physical algorithmic ap-
proach of sound field modeling and 3D sound reproduc- 2. ROOM ACOUSTICAL SIMULATION
tion of the VR systems installed at RWTH Aachen Univer-
sity (see Figure 1). The system is implemented in a first ver- Due to several reasons, which cannot be explained in all de-
sion. It is open to any extended physical sound field mod- tails here, geometrical acoustics is the most important model
eling in real time, and is independent of any particular vi- used for auralization in room acoustics [11]. Wave models
sual VR display technology, for example, CAVE-like displays would be more exact, but only the approximations of geo-
[3] or desktop-based solutions. Our 3D audio system named metrical acoustics and the corresponding algorithms provide
VirKopf has been implemented at the Institute of Technical a chance to simulate room impulse responses in real-time ap-
Acoustics (ITA), RWTH Aachen University, as a distributed plication. In this interpretation, delay line models, radiosity,
architecture. For any room acoustical simulation, VirKopf or others are considered as basically geometric as well since
uses the software RAVEN (room acoustics for virtual envi- wave propagation is reduced to the time-domain approach of
ronments) as a networked service (see Section 2.1). It is ob- energy transition from wall to wall. In geometrical acoustics,
vious that video and audio processing take a lot of comput- deterministic and stochastic methods are available. All deter-
ing resources for each subsystem, and by today’s standards, it ministic simulation models used today are based on the phys-
is unrealistic to do all processing on a single machine. For ical model of image sources [12, 13]. They differ in the way
that reason, the audio system realizes the computation of how sound paths are identified by using forward (ray) tracing
video and audio data on dedicated machines that are inter- or reverse construction. Variants of this type of algorithms
connected by a network. This idea is obvious and has already are hybrid ray tracing, beam tracing, pyramid tracing, and
been successfully implemented by [4] or [5]. There are even so forth [14–20]. Impulse responses from image-like models
Tobias Lentz et al. 3

2.1. Real-time capable implementation

Any room acoustical simulation should take into account the


Energy

above-mentioned physical aspects of sounds in rooms. Typ-


ically, software is available for calculating room impulse re-
sponses of a static source and a listener’s position within a
few seconds or minutes. However, an unrestricted movement
of the receiver and the sound sources within the geometrical
0 1 2 3 4 5 6 7 8 9 10 11 and physical boundaries are basic demands for any interac-
Order tive on-line auralization. Furthermore, any interaction with
Diffuse
the scenery, for instance, opening a door to a neighboring
Specular room, and the on-line-update of the change of the rooms’
modal structures should be provided by the simulation to
produce a high believability of the virtual world [32].
Figure 2: Conversion of specularly into diffusely reflected sound
energy, illustrated by an example (after Kuttruff [23]). At present, a room acoustical simulation software called
RAVEN is being developed at our institute. The software
aims at satisfying all above-mentioned criteria for a realis-
tic simulation of the aural component, however, in respect
consist of filtered Dirac pulses arranged accordingly to their of real-time capability. Special implementations offering the
delay and amplitude and are sampled with a certain tempo- possibility of room acoustical simulation in real time will
ral resolution. In intercomparisons of simulation programs be described in the following sections. RAVEN is basically
[21, 22], it soon became clear that pure image source mod- an upgrade and enhancement of the hybrid room acousti-
eling would create too rough an approximation of physical cal simulation method by Vorländer [20], which was fur-
sound fields in rooms since a very important aspect of room ther extended by Heinz [25]. A very flexible and fast-to-
acoustics—surface and obstacle scattering—is neglected. access framework for processing an arbitrary number of
It can be shown that, from reflections of order two or rooms (see Section 2.2) has been incorporated to gain a high
three, scattering becomes a dominant effect in the tempo- level of interactivity for the simulation and to achieve real-
ral development of the room impulse response [23] even time capability for algorithms under certain constraints (see
in rooms with rather smooth surfaces (see Figure 2). For- Section 5.2). Image sources are used for determining early
tunately, the particular directional distribution of scattered reflections (see Section 2.3) in order to provide a most ac-
sound is irrelevant after the second or third reflection or- curate localization of primary sound sources (precedence ef-
der and can well be assumed as Lambert scattering. How- fect [33]) during the simulation. Scattering and reverbera-
ever, in special cases of rooms with high absorption such as tion are estimated on-line by means of an improved stochas-
recording studios, where directional diffusion coefficients are tic ray tracing method, which will be further described in
relevant, different scattering models have to be used. Solu- Section 2.4.
tions for the problem of surface scattering are given by either
stochastic ray tracing or radiosity [14, 18, 24–27]. Further-
more, the fact that image sources are a good approximation 2.2. Scene partitioning
for perfectly reflecting or low absorption surfaces is often for-
gotten. The approximation of images, however, is valid in The determination of the rooms’ sound reflections requires
large rooms at least for large distances between the source, an enormous number of intersection tests between rays and
wall, and receiver [28]. Another effect of wave physics— the rooms’ geometry since geometrical acoustics methods
diffraction—can be introduced into geometrical acoustics treat sound waves as “light” rays. To apply these methods in
[29, 30], but so far the online simulation has been restricted real time, data structures are required for an efficient repre-
to stationary sound sources. Major problems arise, however, sentation and determination of spatial relationships between
when extending diffraction models to higher orders. Apart sound rays and the room geometry.
from outdoor applications, diffraction has not yet been im- These data structures organize geometry hierarchically in
plemented in the case of applications such as room acous- some n-dimensional space and are usually of recursive nature
tics. It should, however, be mentioned that numerous al- to accelerate remarkably queries of operations such as culling
gorithmic details have already been published in the field algorithms, intersection tests, or collision detections [34, 35].
of sound field rendering so far. New algorithmic schemes Our auralization framework contains a preprocessing
such as those presented by [31] have not yet been imple- phase which transforms every single room geometry into
mented. It should be kept in mind here that the two basic a flexible data structure by using binary space partitioning
physical methods—deterministic sound images and stochas- (BSP) trees [36] for fast intersection tests during the simula-
tic scattering—should be taken into account in a sound field tion. Furthermore, the concept of scene graphs [37], which is
model with a certain performance of realistic physical behav- basically a logical layer on top of the single room data struc-
ior. Sound transmission as well as diffraction must be imple- tures, is used to make this framework applicable for an arbi-
mented in the cases of coupled rooms, in corridors, or cases trary number of rooms and to acquire a high level of interac-
where sound is transmitted through apertures. tivity for the room acoustical simulation.
4 EURASIP Journal on Advances in Signal Processing

ment impossible within real-time constraints due to the ex-


treme number of IS to be tested online on audibility.
Room2
However, the scene graph data structure (see Section
2.2.1) provides the possibility of precomputing subsets
of potentially audible IS according to the current portal
Room0 Room1 configuration by sorting the entire set of IS dependent on
the room(s) they originate from. This can easily be done by
preprocessing the power set of the scene S, where S is a set
of n rooms. The power set of S contains 2n elements, and
0 1 2 every subset, that is, family set of S refers to an n-bit number,
where the mth bit refers to activity or inactivity of the mth
room of S. Then, all ISs are sorted into the respective family
Figure 3: The scenery is split into three rooms, which are repre- sets of S by gathering information about the room IDs of
sented by the nodes of the scene graph (denoted through hexagons). the planes they have been mirrored on. Figure 5 shows ex-
The rooms are connected to their neighboring rooms by 2 por- emplarily the power set P of a scenery S containing the three
tals (room0/room1 and room1/room2, denoted through the dotted rooms R2, R1, R0, and the linked subsets of IS, that is, P(S) =
lines). {{Primary Source},{IS(R0)},{IS(R1)},{IS(R1, R2)},{IS(R2)},
{IS(R2, R0)}, {IS(R2, R1)}, {IS(R2, R1, R0)}}.
During on-line auralization, a depth-first search [37] of
the scene graph determines reachable room IDs for the cur-
rent receiver’s position. This excludes both rooms that are
2.2.1. Scene graph architecture out of bounds and rooms that are blocked by portals. This
set of room IDs is encoded by the power set P to set unreach-
To achieve an efficient data handling for an arbitrary number able rooms invalid as they are acoustically not of interest. If
of rooms, the concept of scene graphs has been used. A scene in the case of this example room R2 gets unreachable for the
graph is a collection of nodes which are linked according to current receiver’s position, for example, someone closed the
room adjacencies. door, only IS family sets of P have to be processed for aural-
A node contains the logical and spatial representation ization that do not contain the room ID R2. As a consequence
of the corresponding subscene. Every node is linked to its thereof, the number of IS family sets to be tested on audibil-
neighbors by so-called portals, which represent entities con- ity drops from eight to four, that is, P(0), P(1), P(2), P(3),
necting the respective rooms, for example, a door or a win- which obviously leads to a significant reduction of computa-
dow (see Figure 3). It should be noted that the number of tion time.
portals for a single node is not restricted, hence the scenery
During simulation it will have to be checked whether ev-
can be partitioned quite flexibly into subscenes. The great ad-
ery possible audible image source, which is determined as de-
vantage of using portals is their binary nature as two states
scribed above, is audible for the current receiver’s position
can occur. The state “active” connects two nodes defined
(see Figure 4(a)). Taking great advantage of the scene graph’s
by the portal, whereas the state “passive” cuts off the spe-
underlying BSP-tree structures and an efficient tree travers-
cific link. This provides a high level of interactivity for the
ing strategy [38], the required IS audibility test can be done
room acoustical simulations as room neighborhoods can
very fast (performance issues are discussed in more detail in
be changed on-line, for instance, doors may be opened or
Section 5.2.1). If an image source is tested on audibility for
closed. In addition, information about portal states can be
the current receiver’s position, all data being required for fil-
exploited to speed up any required tests during the on-line
ter calculation (position, intersection points, and hit mate-
room acoustical simulation by neglecting rooms which are
rial) will be stored in the super-ordinated container “audible
acoustically not of interest, for example, rooms that are out
sources” (see Figure 4(a)).
of bounds for the current receiver’s position.

2.3. Image source method 2.4. Ray tracing

The concept of the traditional image source (IS) method pro- The computation of the diffuse sound field is based on the
vides a quite flexible data structure, as, for instance, the on- stochastic ray tracing algorithm proposed by Heinz [39]. For
line movement of primary sound sources and their corre- building the binaural impulse response from the ray tracing
sponding image sources is supported and can be updated data, Heinz assumed that the reverberation is ideally diffuse.
within milliseconds. Unfortunately, the method fails to sim- This assumption is, however, too rough, if the room geom-
ulate large sceneries as the computational costs are domi- etry is extremely long or flat and if it contains objects like
nated by the exponential growth of image sources with an columns or privacy screens. Room acoustical defects such as
increasing number of rooms, that is, polygons and reflec- (flutter) echos would remain undetected [40, 41]. For a more
tion order. Applying the IS method to an arbitrary number realistic room acoustical simulation, the algorithm has been
of rooms would result in an explosion of IS to be processed, changed in a way so that these effects are taken into account
which would make a simulation of a large virtual environ- (see Figure 4(b)). This aspect is an innovation in real-time
Tobias Lentz et al. 5

RAVEN RAVEN

Scene graph Listener position Center frequency Material map Impulse response

Image sources Absorption coefficients


Sort into
Scatter coefficients impulse response
IS audibility test
Collision Trace ray Scene graph IFFT
data Multiply impulses with
All possible Check image directivity-groups’ HRTFs Room-acoustic
image sources source Ray tracer Absorb energy server
Find intersection
Scatter ray Distribute Dirac-impulses
If audible to directivity-groups
Fire ray Trace ray
(Poisson)
Audible sources Energy
If detection sphere hit Time Determine directivity-
Angles of impact groups of time slot
Room-acoustic Histogram
server

(a) Image sources (b) Ray tracing

Figure 4: (a) Image source audibility test, (b) estimation of scattering and reverberation.

ID R2 R1 R0 IS subset ral envelope of the energetic spatial impulse response. One


single field of the histogram contains information about rays
7 1 1 1 R2 R1 R0 (their energy on arrival, time, and angles of impact) which
6 1 1 0 R2 R1 hit the detection sphere during a time interval Δt for a dis-
5 1 0 1 R2 R0
crete frequency interval fb . At first, the mean energy for fields
with different frequencies but the same time interval is cal-
4 1 0 0 R2 culated to obtain the short-time energy spectral density. This
3 0 1 1 R1 R0 step is also used to create a ray directivity distribution over
time for the respective rays: for each time slot, the detection
2 0 1 0 R1
sphere is divided into evenly distributed partitions, so-called
1 0 0 1 R0 directivity groups. If a ray hits the sphere, the ray’s remain-
0 0 0 0 Primary source ing energy on impact is added to the corresponding sphere’s
directivity group depending on its time and direction of ar-
rival (see Figure 6).
Figure 5: IS/room-combination-power set P(S) for a three-room This energy distribution is used to determine a ray prob-
situation. All IS are sorted into encapsulated containers depending
ability for each directivity group and each time interval Δt.
on the room combination they have been generated from.
Then a Poisson process with a rate equal to the rate of reflec-
tions for the given room and the given time interval is cre-
ated. Each impulse of the process is allotted to the respective
virtual acoustics, which is to be considered as an important directivity group depending on the determined ray probabil-
extension of the perceptive dimension. ity distribution. In a final step, each directivity group which
The BSP-based ray tracing simulation starts by emitting a was hit by a Poisson impulse cluster is multiplied by its re-
finite number of particles from each sound source at random spective HRTF, superposed to a binaural signal, and weighted
angles where each particle carries a source directivity de- by the square root of the energy spectral density. After that,
pendent amount of energy. Every particle loses energy while the signal is transformed into time domain. This is done for
propagating due to air absorption and occurring reflections every time step of the histogram and put together to the com-
on walls, either specular or diffuse, and other geometric ob- plete binaural impulse response. The ray tracing algorithm is
jects inside the rooms, that is, a material dependent absorp- managed by the room acoustics server to provide the possi-
tion of sound. The particle gets terminated as soon as the bility of a dynamic update depth for determining the diffuse
particle’s energy is reduced under a predefined threshold. Be- sound field component (see Section 3). Since this contribu-
fore a time t0 , which represents the image source cut-off time, tion focuses on the implementation and performance of the
only particles are detected which have been reflected specu- complete system, no further details are presented here. A de-
lar with a diffuse history in order to preserve a correct energy tailed description of the fast implementation and test results
balance. After t0 , all possible permutations of reflection types can be found in [42].
are processed (e.g., diffuse, specular, diffuse, diffuse, etc.).
The ray tracing is performed for each frequency band 3. FILTER PROCESSING
due to frequency dependent absorption and scattering coef-
ficients, which results in a three-dimensional data container For a dynamic auralization where the listener is allowed to
called histogram. This histogram is considered as the tempo- move, turn, and interact with the presented scenery and
6 EURASIP Journal on Advances in Signal Processing

2.5
2

Energy
1.5
1
0.5
0
0
2
4
6
8
21
Tim 10 3 s
e sl
ot s
12 5 4 and
14 76 y b
nc
16
18 9 8 que
10 e
20 Fr

Figure 6: Histogram example of a single directivity group.

where the sources can also be moved, the room impulse The diffuse reflections will be handled by the ray tracing al-
response has to be updated very fast. This becomes also gorithm, (see Section 3.2).
more important in combination with congruent video im- Another important influence on the sound in a room, es-
ages. Thus, the filter processing is a crucial part of the real- pecially a large hall, is the directivity of the source. This is
time process [8]. The whole filter construction is separated even more important for a dynamic auralization where not
into two parts. The most important section of a binaural only the listener is allowed to move and interact with the
room impulse response is the first part containing the direct scenery but where the sources can also move or turn. The
sound and the early reflections of the room. These early re- naturalness of the whole generated sound scene is improved
flections are represented by the calculated image sources and by every dynamic aspect being taken into account. The pro-
have to be updated at a rate which has to be sufficient for gram accepts external directivity databases of any spatial res-
the binaural processing. For this reason, the operation inter- olution, and the internal database has a spatial resolution of 5
face between the room acoustics server and the auralization degrees for azimuth and elevation angles. This database con-
server is the list of the currently audible sources. The second tains the directivity of a singer and several natural instru-
part of the room impulse response is calculated on the room ments. Furthermore, it is possible to generate a directivity
acoustics server (or cluster) to minimize the time required manually. The air absorption filter is only distance dependent
by the network transfer because the amount of data required and is applied also to the direct sound, which is essential for
to calculate the room impulse response is significantly higher far distances between the listener and source.
than the resulting filter itself. At the end of every filter pass, which represents, up to
now, a mono signal, an HRTF has to be used to generate a
3.1. Image sources binaural head-related signal which contains all directional
information. All HRTFs used by the VirKopf system were
Every single fraction of the complete impulse response, either measured with the artificial head of the ITA for the full sphere
the direct sound or the sound reflected by one or more walls, due to the asymmetrical pinnae and head geometry. Non-
runs through several filter elements as shown in Figure 7. El- symmetrical pinnae lead to positive effects on the perceived
ements such as directivity, wall, and air absorption are filters externalization of the generated virtual sources [43]. A strong
in a logarithmic frequency representation with a third octave impulse component such as the direct sound carries the most
band scale with 31 values from 20 Hz to 20 kHz. These filters important spatial information of a source in a room. In or-
contain no phase information so that only a single multipli- der to provide a better resolution, even at low frequencies, an
cation is needed. The drawback of using a logarithmic rep- HRTF of a higher resolution is used for the direct sound. The
resentation is the necessity of interpolation to multiply the FIR filter length is chosen to be 512 taps. Due to the fact that
resulting filter with the HRTF. But this is still not as com- the filter processing is done in the frequency domain, the fil-
putationally expensive as using a linear representation for all ter is represented by 257 complex frequency domain values
elements, particularly if more wall filters have to be consid- corresponding to a linear resolution of 86 Hz.
ered for the specific reflection. Furthermore, the database does not only contain HRTFs
So far, the wall absorption filters are independent of the measured at one specific distance but, also near-field HRTFs.
angle of sound incidence, which is a common assumption This provides the possibility of simulating near-to-head
for room acoustical models. It can be extended to consider sources in a natural way. Tests showed that the increasing in-
angle-dependent data if necessary. Reflections calculated by teraural level difference (ILD) becomes audible at a distance
using the image source model will be attenuated by the factor of 1.5 m or closer to the head. This test was performed in
of the energy which is distributed by the diffuse reflections. the semianechoic chamber of the ITA, examining the ranges
Tobias Lentz et al. 7

Direct sound
Air Inter-
Directivity HRTF
absorption polation

1/3 octave band scale 512 taps


Single reflection
Wall Wall Air Inter-
Directivity absorption absorption · · · absorption polation HRTF

1/3 octave band scale 128 taps

Figure 7: Filter elements for direct sound and reflections.

where different near-field HRTFs have to be applied. The lis- cording to the importance of the filter segment, which is re-
teners were asked to compare signals from simulated HRTFs lated to the time alignment, the auralization process can send
with those from correspondingly measured HRTFs on two interrupt commands to the simulation server. If a source or
criteria, namely, the perceived location of the source and any the listener is moving too fast to finish the calculation of the
coloration of the signals. The simulated HRTFs were pre- filter within an adequate time slot, the running ray tracing
pared from far-field HRTFs (measured at a distance of two process will be stopped. This means that the update depth
meters) with a simple-level correction applied likewise to of the filter depends on the movements of the listener or
both channels. All of the nine listeners reported differences the sources. In order to achieve an interruptible ray tracing
with regard to lateral sound incidences in the case of dis- process, it is necessary to divide the whole filter length into
tances being closer than 1.5 m. No difference with regard to several parts. When a ray reaches the specified time stamp,
frontal sound incidences was reported in the case of distances the data necessary to restart the ray at this position will be
being closer than 0.6 m. These results are very similar to the saved and the next ray is calculated. After finishing the calcu-
results obtained by research carried out in other labs, for ex- lation of all rays, the filter will be processed up to the time the
ample, [44]. Hence, HRTFs were measured at distances of ray tracing updated the information in the histogram (this
0.2 m, 0.3 m, 0.4 m, 0.5 m, 0.75 m, 1.0 m, 1.5 m, and 2.0 m. can also be a parallel process, if provided by the hardware).
The spatial resolution of the databases is 1 degree for azimuth At this time, it is also possible to send the first updated filter
and 5 degrees for elevation angles for both the direct sound section to the auralization server, which means that it is pos-
and the reflections. sible to take the earlier part of the changed impulse response
The FIR filter length of 128 taps used for the contribu- into account before the complete ray tracing is finished. At
tion of image sources is lower than for the direct sound, but this point, the ray tracing process will decide on the inter-
is still higher than the limits to be found in literature. Inves- rupt flag whether the calculation is restarted at the beginning
tigations regarding the effects of a reduced filter length on of the filter or at the last time stamp. For slight or slow move-
localization can be found in [45]. As for the direct sound, ments of the head or of the sources, the ray tracing process
the filter processing is done in the frequency domain with has enough time to run through a complete calculation cycle
the corresponding filter representation of 65 complex values. containing all filter time segments. This also leads to the fact
Using 128 FIR coefficients leads to the same localization re- that the level of the simulation’s accuracy rises with the du-
sults, but brings about a considerable reduction of the pro- ration the listener stands at approximately the same position
cessing time (see Table 3). This was tested as well in internal and the sources do not move.
listening experiences but is also congruent to the findings of
other labs, that is, [46]. The spatial representation of image 4. REPRODUCTION SYSTEM
sources is realized by using HRTFs measured in 2.0 m. In this
case, this does not mean any simplification because the room The primary reproduction system of the room acoustical
acoustical simulation using image sources is not valid any- modeling described in this paper is a setup mounted in the
way at distances close (a few wavelengths) to a wall. A more CAVE-like environment, which is a five-sided projection sys-
detailed investigation relating to that topic can be found in tem of a rectangular shape, installed at RWTH Aachen Uni-
[28, 47]. versity. The special shape enables the use of the full resolution
of 1600 by 1200 pixels of the LCD projectors on the walls and
3.2. Ray tracing the floor as well as a 360 degree horizontal view. The dimen-
sions of the projection volume are 3.60 × 2.70 × 2.70 m3 yield-
As mentioned above, the calculation of the binaural impulse ing a total projection screen area of 26.24 m2 . Additionally,
response of the ray tracing process is done on the ray tracing the use of passive stereo via circular polarization allows light-
server in order to reduce the amount of data which has to be weight glasses. Head and interaction device tracking is real-
transferred via the network. To keep the filters up-to-date ac- ized by an optical tracking system. The setup of this display
8 EURASIP Journal on Advances in Signal Processing

0
(c)
−10
H2L H1R −20
Crosstalk
−30
H2R H1L
(b)
dB −40
−50
(a)
−60

−70

−80

0.2 0.5 1 2 5 10
kHz

Figure 8: The CAVE-like environment at RWTH Aachen Univer-


Figure 9: Measurement of the accessible channel separation using
sity. Four loudspeakers are mounted on the top rack of the system.
a filter length of 1024 taps. (a) = calculated, (b) = static solution, (c)
The door, shown on the left, and a moveable wall, shown on the
= dynamic system.
right, can be closed to allow a 360-degree view with no roof projec-
tion.

system is an improved implementation of the system [48] ble to calculate the CTC filter online for the current position
that was developed with the clear aim to minimize attach- and orientation of the user. The calculation at runtime en-
ments and encumbrances in order to improve user accep- hances the flexibility of the VirKopf system regarding the va-
tance. In that sense, much of the credibility that CAVE-like lidity area and the flexibility of the loudspeaker setup which
environments earned in recent years has to be attributed to can hardly be achieved with preprocessed filters. Thus, a
the fact that they try to be absolutely nonintrusive VR sys- database containing “all” possible HRTFs is required. The
tems. As a consequence, a loudspeaker-based acoustical re- VirKopf system uses a database with a spatial resolution of
production system seems to be the most desired solution for one degree for both azimuth (ϕ) and elevation (ϑ). The
acoustical imaging in CAVE-like environments. Users should HRTFs were measured at a frequency range of 100 Hz–
be able to step into the virtual scenery without too much 20 kHz, allowing a cancellation in the same frequency range.
preparation or calibration but still be immersed in a believ- It should be mentioned that a cancellation at higher frequen-
able environment. For that reason, our CAVE-like environ- cies is more error prone to misalignments of the loudspeak-
ment depicted above was extended with a binaural reproduc- ers and also to individual differences of the pinna. This is
tion system using loudspeakers. also shown by curve (c) in Figure 9. The distance between
the loudspeaker and the head affects the time delay and the
4.1. Virtual headphone level of the signal. Using a database with HRTFs measured
at a certain distance, these two parameters must be adjusted
To reproduce the binaural signal at the ears with a sufficient by modifying the filter group delay and the level according to
channel separation without using headphones, a crosstalk the spherical wave attenuation for the actual distance.
cancellation (CTC) system is needed [49–51]. Doing the CTC To provide a full head rotation of the user, a two loud-
work in an environment where the user should be able to speaker setup will not be sufficient as the dynamic can-
walk around and turn his head requires a dynamic CTC sys- cellation will only work in between the angle spanned by
tem which is able to adapt during the listener’s movements the loudspeakers. Thus, a dual CTC algorithm with a four-
[52, 53]. The dynamic solution overrides the sweet spot speaker setup has been developed, which is further described
limitation of a normal static crosstalk cancellation. Figure 8 in [54]. With four loudspeakers, eight combinations of a nor-
shows the four transfer paths from the loudspeakers to the mal two-channel CTC system are possible and a proper can-
ears of the listener (H1L = transfer function loudspeaker 1 cellation can be achieved for every orientation of the listener.
to left ear). A correct binaural reproduction means that the An angle dependent fading is used to change the active speak-
complete transfer function from the left input to the left ear ers in between the overlapping validity areas of two configu-
(reference point is the entrance of the ear canal) including rations.
the transfer function H1L is meant to become a flat spectrum. Each time the head-tracker information is updated in the
The same is intended for the right transfer path, accordingly. system, the deviation of the head to the position and ori-
The crosstalk indicated by H1R and H2L has to be canceled by entation compared to the information given which caused
the system. the preceding filter change is calculated. Every degree of free-
Since the user of a virtual environment is already tracked dom is weighted with its own factor and then summed up.
to generate the correct stereoscopic video images, it is possi- Thus, the threshold can be parameterized in six degrees of
Tobias Lentz et al. 9

freedom, positional values (Δx, Δy, Δz), and rotational val- [57, 58] and different CTC systems indicate a better sub-
ues (Δϕ, Δϑ, Δρ). A filter update will be performed when jective performance than it would be expected from mea-
the weighted sum is above 1. The lateral movement and the surements. One aspect validating this phenomenon is the
head rotation in the horizontal plane are most critical so precedence effect by which sound localization is primarily
Δx = Δy = 1 cm and Δϕ = 1.0 degree are chosen to domi- determined by the first arriving wavefront; the other as-
nate the filter update. The threshold always refers to the value pect is the head movement which gives the user the abil-
where the limit was exceeded the last time. The resulting hys- ity to approve the perceived direction of incidence. A more
teresis prevents a permanent switching between two filters as detailed investigation on the performance of our binau-
it may occur when a fixed spacing determines the boundaries ral rendering and reproduction system can be found in
between two filters and the tracking data jitter slightly. [59].
One of the fundamental requirements of the sound The latency of the audio reproduction system is the time
output device is that the channels work absolutely syn- elapsed between the update of a new position and orienta-
chronously. Otherwise, the calculated crosstalk paths do not tion of the listener, and the point in time at which the out-
fit with the given condition. On this account, the special au- put signal is generated with the recalculated filters. The out-
dio protocol ASIO designed by Steinberg for professional au- put block length of the convolution (overlap save) is 256
dio recording was chosen to address the output device [55]. taps as well as the chosen buffer length of the sound out-
To classify the performance that could be reached theo- put device, resulting in a time between two buffer switches of
retically by the dynamic system, measurements of a static sys- 5.8 milliseconds at 44.1 kHz sampling rate for the rendering
tem were made to have a realistic reference for the achieved of a single block. The calculation of a new CTC filter set (1024
channel separation. Under absolute ideal circumstances, the taps) takes 3.5 milliseconds on our test system. In a worst case
HRTFs used to calculate the crosstalk cancellation filters are scenario, the filter calculation just finishes after the sound
the same as during reproduction (individual HRTFs of the output device fetched the next block, so it takes the time play-
listener). In a first test, the crosstalk cancellation filters were ing this block until the updated filter becomes active at the
processed with HRTFs of an artificial head in a fixed position. output. That would cause a latency of one block. In such a
The windowing to a certain filter length and the smoothing case, the overall latency accumulates to 9.3 milliseconds.
give rise to a limitation of the channel separation. The inter-
nal filter calculation length is chosen to 2048 taps in order 4.2. Low-latency convolution
to take into account the time offsets caused by the distance
to the speakers. The HRTFs were smoothed with a band- A part of the complete dynamic auralization system requir-
width of 1/6 octave to reduce the small dips which may cause ing a high amount of processing power is the convolution
problems by inverting the filters. After the calculation, the fil- of the audio signal. A pure FIR filtering would cause no ad-
ter set is truncated to the final filter length of 1024 taps, the ditional latency except for the delay of the first impulse of
same length that the dynamic system works with. However, the filter, but it also causes the highest amount of process-
the time alignment among the single filters is not affected ing power. Impulse responses of more than 100 000 taps or
by the truncation. The calculated channel separation using more cannot be processed in real time on a PC system us-
this (truncated) filter set and the smoothed HRTFs as refer- ing FIR filters in the time domain. The block convolution is a
ence is plotted in Figure 9 curve (a). Thereafter, the achieved method that reduces the computational cost to a minimum,
channel separation was measured at the ears of the artificial but the latency increases in proportion to the filter length.
head, which had not been moved since the HRTF measure- The only way to minimize the latency of the convolution is
ment (Figure 9 curve (b)). a special conditioning of the complete impulse response in
In comparison to the ideal reference cases, Figure 9 curve filter blocks. Basically, we use an algorithm which works in
(c) shows the achieved channel separation of the dynamic the frequency domain with small block sizes at the begin-
CTC system. The main difference between the static and the ning of the filter and increasing sizes to the end of the fil-
dynamic system is the set of HRTFs used for filter calcu- ter. More general details about these convolution techniques
lation. The dynamic system has to choose the appropriate can be found in [60]. However, our algorithm does not op-
HRTF from a database and has to adjust the delay and the erate on the commonly used segmentation which doubles
level depending on the position data. All these adjustments the block length every other block. Our system provides a
cause minor deviations from the ideal HRTF measured di- special block size conditioning with regard to the specific
rectly at this point. For this reason, the channel separation PC hardware properties as, for instance, cache size or spe-
of the dynamic system is not as high as the one that can be cial processing structures such as SIMD (single instruction
achieved by a system with direct HRTF measurement. multiple data). Hence, the optimal convolution adds a time
The theory of crosstalk cancellation is based on the as- delay of only the first block to the latency of the system, so
sumption of a reproduction in an anechoic environment. that it is recommended to use a block length as small as pos-
However, the projection walls of CAVE-like environments sible. The amount of processing power is not linear to the
consist of solid material causing reflections that decrease overall filter length and also constrained by the chosen start
the performance of the CTC system. Listening tests with block length. Due to this, measurements were done to deter-
our system show [56] that the subjective localization per- mine the processor load of different modes of operation (see
formance is still remarkably good. Also tests of other labs Table 1).
10 EURASIP Journal on Advances in Signal Processing

Table 1: CPU load of the low-latency convolution algorithm.

Number of sources
Impulse response length 3 10 15 20 3 10 15 20
(Latency 256 taps) (Latency 512 taps)
0.5 s 9% 30% 50% 76% 8% 22% 30% 50%
1.0 s 14% 40% 66% — 11% 33% 53% 80%
2.0 s 15% 50% 74% — 14% 42% 71% —
3.0 s 18% 62% — — 16% 53% — —
5.0 s 20% 68% — — 18% 59% — —
10.0 s 24% — — — 20% 68% — —

5. SYSTEM INTEGRATION different. Theoretical and empirical discussions about typi-


cal head movement in virtual environments are still a subject
The VirKopf system constitutes the binaural synthesis and of research, for example, see [61–63] or [64].
reproduction system, the visual-acoustic coupling, and it is As a field study, we recorded tracking data of users’ head
connected to the RAVEN system for room acoustical simu- movements while interacting in our virtual environment.
lations. The complete system’s layout with all components From these data, we calculated the magnitude of the veloc-
is shown in Figure 10. As such it describes the distributed ity of head rotation and translation in order to determine the
system which is used for auralization in the CAVE-like en- requirements for the room acoustics simulation. Figure 11(a)
vironment at RWTH Aachen University, where user inter- shows a histogram of the evaluated data for the translational
action is tracked by six cameras. As a visual VR machine, a velocity. Following from the deviation of the data, the mean
dual Pentium 4 machine with 3 GHz CPU speed and 2 GB translational velocity is at 15.4 cm/s, with a standard devi-
of RAM is used (cluster master). The host for the audio VR ation of 15.8 cm/s and the data median at 10.2 cm/s, com-
subsystem is a dual Opteron machine with 2 GHz CPU speed pare Figure 11(c). This indicates that the update rate of the
and 1 GB of RAM. The room acoustical simulations run on room acoustical simulation can be rather low for transla-
Athlon 3000+ machines with 2 GB of RAM. This hardware tional movement as the overall sound impression does not
configuration is also used as a test system for all perfor- change much in the immediate vicinity (see [65] for fur-
mance measurements. As audio hardware, an RME Ham- ther information). As an example, imagine a room acoustical
merfall system is used which allows sound output stream- simulation of a concert hall where the threshold for trigger-
ing with a scalable buffer size and a minimum latency of ing a recalculation of a raw room impulse response is 25 cm
1.5 milliseconds. In our case, an output buffer size is chosen (which is typically half a seat row’s distance). With respect to
to 256 taps (5.8 milliseconds). The network interconnection the translational movement profile of a user, a recalculation
between all PCs was a standard Gigabit Ethernet. has to be done approximately every 750 milliseconds to catch
about 70% of the movements. If the system aims at calculat-
5.1. Real-time requirements ing correct image sources for about 90% of the movements,
this will have to be done every 550 milliseconds. A raw im-
Central aspects of coupled real-time systems are latency and pulse response contains the raw data of the images, their am-
the update rate for the communication. In order to get an ob- plitude and delay, but not their direction in listener’s coordi-
jective criterion for the required update rates, it is mandatory nates. The slowly updated dataset represents, thus, the room-
to inspect typical behavior inside CAVE-like environments related cloud of image sources. The transformation into 3D
with special respect to head movement types and magnitude listener’s coordinates and the convolution will be updated
of position or velocity changes. much faster, certainly, in order to allow a direct and smooth
In general, user movements in CAVE-like environments responsiveness.
can be classified in three categories [61]. One category is CAVE-like environments allow the user to directly move
identified by the movement behavior of the user inspecting a in the scene, for example, by walking inside of the boundaries
fixed object by moving up and down and from one side to the of the display surfaces and tracking area. Additionally, indi-
other in order to accumulate information about its structural rect navigation enables the user to move in the scenery vir-
properties. A second category can be seen in the movements tually without moving his body but by pointing metaphors
when the user is standing at one spot and uses head or body when using hand sensors or joysticks. Indirect navigation is
rotations to view different display surfaces of the CAVE. The mandatory, for example, for architectural walkthroughs as
third category for head movements can be observed when the the virtual scenery is usually much larger than the space cov-
user is doing both, walking and looking around in the CAVE- ered by the CAVE-like device itself. The maximum velocity
like environment. Mainly, the typical applications we employ for indirect navigations has to be limited in order to avoid
can be classified as instances of the last two categories, al- artifacts or distortions in the acoustical rendering and per-
though the exact user movement profiles can be individually ception. However, during the indirect movement, users do
Tobias Lentz et al. 11

Room acoustics server VR application


Geometric
room model
Position data
(listener, sources)
Image sources Ray tracing
Room model
Start/ Threshold (high detailed)
interrupt Δs = 1 m◦
All possible Threshold Histogram
image sources Δs = 0.25 m calculation Δα = 10 Interaction
manager
Translation/IS Binaural filter HRTF
audibility test generation database Graphic
manager

Auralization server
Threshold
Δs = 0.05 m
Audible sources
Δα = 2◦
Crosstalk cancellation Threshold
Position/direction Δs = 0.01 m
update Δα = 1◦
HRTF Filter calculation
database HRTF database
Filter calculation
Segmented convolution Block convolution ASIO output
Filter combination Filter Filter Buf A Buf B
∗ ∗
··· ··· ··· ···
Audio files/ Buffer switch
input stream Audio stream Audio stream

Figure 10: The complete binaural auralization system.

1200 2500

1000
2000

800
1500
Quantity

Quantity

600
1000
400

500
200

0 0
0 20 40 60 80 0 20 40 60 80 100
Velocity (cm/s) Velocity (degrees/s)
(a) (b)

vt (cm/s) vr (deg/s)
x 15.486 8.686
σ 15.843 11.174
x 10.236 5.239
xmax 84.271 141.458
(c)

Figure 11: Histogram of translational (vt ) and rotational (vr ) velocities of movements of a user acting in a CAVE-like environment. The
blue line depicts the cumulative percentage of the measurements. In (b), we limited the upper bound to 100 degrees/s for better readability,
(c) shows the descriptive statistics about the measurements.
12 EURASIP Journal on Advances in Signal Processing

300

Computation time (millisecond)


250

200

150

100

Figure 12: Sliced polygon model of the concert hall of Aachen’s Eu-
50
rogress convention center.

0
0 20 40 60 80 100 120 140 160 180 200
Number of polygons

not tend to move their head and the overall sensation re- BSP
duces the capability to evaluate the correctness of the sim- Brute
ulation. Once the users stop, it takes about 750 milliseconds
as depicted above to calculate the right filters for the current Figure 13: Comparison of required computation time for the ISs
user position. We made the experience that a limitation of the audibility test up to second-order ISs for different Eurogress mod-
maximum velocity for indirect navigation to 100 cm/s shows els which differ in their level of detail (see [38] for details). With
good results and user acceptance. the growing number of polygons for the model’s different lev-
In addition to the translational behavior, Figure 11(b) els of detail, the number of ISs grows exponentially, which leads
shows the rotational profile for head movements of a user. to an exponential growth of the computation time for the brute-
force approach. The computation time demands of the BSP-based
Peak angular velocities can be up to 140 degrees per sec-
method grows only linear due to the drop of search complexity up
ond although these are very seldom. The mean for rotational to O(log N), N number of polygons.
movement is at 8.6 degrees/s with a standard deviation of
11.1 degrees/s and a data median at 5.2 degrees/s, compare
Figure 11(c). Data sets provided as standard material for re-
search on system latency, for example, by [66] or [61], show
comparable results.
tains all indoor elements of the room which are acoustically
The orientation of the user’s head in the sound field is
of interest [67], for example, the stage, the skew wall ele-
very critical as reflections have to be calculated for the head-
ments, and the balustrade. Details of small elements are ne-
related impulse response in listener’s coordinates. The chang-
glected and represented by equivalent scattering [68]. Surface
ing ITD of the HRTFs during head rotation may cause a sig-
properties, that is, absorption and scattering coefficients are
nificant phase mismatch of two filters. In cross-fading from
defined through standardized material data [69, 70].
one room impulse response to the next, these differences
should not be too big as this might result in audible comb-
filter effects. To reduce these differences, a filter change ev- 5.2.1. Image source method performance
ery 1-2 degrees is necessary here. In order to be precise for
The computation time for the translational movement of
almost all possible rotational velocities, we consider a tim-
primary sound sources and their respective image sources
ing interval for a recalculation every 10–20 milliseconds as
depends solely on the number of image sources. An aver-
mandatory. As a consequence, the block size configured in
age computation time of about 1 millisecond per 1000 im-
the audio processing hardware should not be bigger than 512
age sources was measured. The main part of the computation
samples as this limits the minimal possible update time to
time is needed for the audibility test.
11.6 milliseconds at a 44.1 kHz sampling rate.
To give a better idea of the achieved speed up by the use
of BSP trees, a brute-force IS audibility test has been im-
5.2. Performance of the room acoustical simulation plemented for comparison purpose. This algorithm tests ev-
ery scene’s polygon on intersection instead of testing only
To evaluate the implementation and to determine its real- a few room’s subpartitions by means of a BSP-tree struc-
time capabilities, several experiments were carried out on the ture. Figure 13 shows a comparison of measured computa-
test system. For a realistic evaluation, a model of the concert tion times for the IS-audibility test up to second IS order
hall of Aachen’s Eurogress (volume about 15 000 m3 ) con- of both approaches. As expected, the computation time of
vention center was constructed, which is shown in Figure 12. the brute-force method rises exponentially with the expo-
All results presented in this contribution are based on this nentially growing number of ISs, whereas the BSP-based ap-
model. proach has only a quite linearly growing computation time
The model is constructed of 105 polygons and 74 planes, demand due to the drop of search complexity up to O(log N),
respectively. Although it is kept quite simple, the model con- N number of polygons.
Tobias Lentz et al. 13

Table 2: Comparison of the measurement results of the IS audibil- 6


ity test.

Number of IS IS audibility test 5


IS order
All Audible BSP [ms] Brute [ms]

Computation time (s)


1 75 9 0.153 0.959 4
2 4,827 32 10.46 61.27
3 309 445 111 710.07 3924 3

Table 3: Calculation time of several parts of the filter.


2
Processing step Time
Direct sound (512 taps) 300 μs 1
Single reflection (aver.) 50 μs
Preparation for segmented 0
1.1 ms 0 0.5 1 1.5 2 2.5
convolution (6000 samples)
Filter length (s)

BSP
Brute
With the assigned time slot (see Section 5.1) of 750 mil-
liseconds for the simulation process, real-time capability for
Figure 14: Comparison of required computation times for the
a room acoustical simulation with all degrees of freedom
determination of impulse responses with increasing length using
such as movable sound sources, movable receiver, chang- 80 000 rays for the simulation.
ing sources’ directivities, and interaction with the scenery is
reached for about 320 000 ISs to be tested during runtime.
Applying these constraints to the measurement results of the
IS audibility test (see Table 2) makes the simulation of the ing number of reflections, that is, the growing rays’ loss of
Eurogress model real-time capable up to order 3. energy and ray termination, respectively. As an example, the
Besides the performance of the room acoustical simula- algorithm needs an average of about 2.6 second per 80 000
tion, the processing time of the filter is very important. All rays (10 000 rays per frequency band, the first two octave
time measurements of the calculation routines presented in bands are skipped) for determination of an impulse response
this section are performed on our test system. with the length of 1 secone. As the processing time of the ray-
Calculating the image sources of the Eurogress model up tracing algorithm increases linearly with the number of rays
to the third order, 111 audible image sources can be found in used, a comparison of these results is redundant. It is obvious
the first part of the impulse response of 6000 samples length that the algorithm is able to cope with the real-time require-
corresponding to 136 milliseconds. In this case, one source is ments, especially when using small numbers of rays at first
placed on the stage, and the listener is located in the middle to get a low-resolution histogram. If the listener stays at one
of the room. The complete filter processing (excluding the place for a longer period of time, the ray tracer can update
audibility test) is done in 6.95 milliseconds. Note, that the the histogram with more rays to get a higher resolution and
filter processing has different entry points. The rotation of determine a longer impulse response, respectively.
the listener or a source does not cause a recalculation of the
audible sources, only the filter has to be processed. 5.3. Network

5.2.2. Ray-tracing performance With respect to the timing, the optical tracking system is ca-
pable of delivering spatial updates of the position and orien-
For measuring the performance of the ray-tracing algorithm, tation of the user’s head and an additional interaction device
all materials of the Eurogress model were replaced by a single to the VR application in 18.91 milliseconds. This figure is a
one in order to avoid influences of different scattering and direct result from the sum of the time needed for the visual
absorption coefficients on the results. recognition of two tracking targets as well as the transmis-
As in the previous section, a brute-force ray tracing al- sion time for the measured data over a network link. For ap-
gorithm has been implemented to compare the results to plications that must have a minimum latency time and do
the BSP-based method we use in our framework. While the not need wireless tracking, the usage of an electromagnetic
brute-force approach has a linearly growing computation tracking system can reduce the latency to ≈ 5 milliseconds.
time, that is, a complexity of O(N), N number of poly- However, the VirKopf system distinguishes between two
gons, the BSP-based algorithm grows only logarithmically types of update messages. One type deals with low-frequency
with increasing time due to the drop of search complexity state changes such as commands to play or stop a specific
to O(log N) (see Figure 14, t < 0.8 second). A ray gets termi- sound. The second type updates the spatial attributes of the
nated if a minimum energy threshold is reached. Thus, both sound source and the listener at a high frequency. For the first
approaches get faster with increasing time due to the grow- type, a reliable transport protocol is used (TCP), while the
14 EURASIP Journal on Advances in Signal Processing

latter is transmitted at a high frequency over a low overhead Table 4: Overview of performance measurements of the several
but possibly unreliable protocol (UDP). subsystems.
In order to get an estimate of the costs of network trans- Action Time
port, the largest possible TCP and UDP messages produced
Tracking 18.90 ms
by the VirKopf system were transmitted from the VR ap-
UDP transport 0.26 ms
plication to the VirKopf server many times and then sent
back. The transmission time for this round trip was taken CTC filter generation 3.50 ms
and halved for a single-trip measurement. The worst case Audio buffer swap 5.80 ms
times of the single trips are taken as a basis for the estimation IS audibility test 710.00 ms
of the overall cost introduced by the network communica- IS filter (2 × 6.95 ms) 13.90 ms
tion. The mean time for transmitting a TCP command was Ray tracing
0.15 millisecond ± 0.02 millisecond. The worst case transmis- 500 ms impulse response length 1600.00 ms
sion time on the TCP channel was close to 1.2 millisecond. 1 s impulse response length 2600.00 ms
UDP communication was measured for 20 000 spatial update 2 s impulse response length 3000.00 ms
tables for 25 sound sources, resulting in a transmission time
for the table of 0.26 millisecond ± 0.01 millisecond. It seems
surprising that UDP communication is more expensive than
TCP, but this is a result from larger packet sizes of an spatial contains a few wrong reflections which will be removed after
update (≈ 1 kB) in comparison to small TCP command sizes the audibility test. Thus, the specular reflections at the first
(≈ 150 bytes). part of the impulse response become audible with the correct
spatial representation already after 35 milliseconds (tracking
+ UDP transport + CTC filter generation IS filter generation
5.4. Overall performance + audio buffer swap). This is also the time needed to react to
a listener’s head rotation (see Table 5).
Several aspects have to be taken into account to give an
overview of the performance of the complete system, the per-
formance of several subsystems, the organization of parallel 6. SUMMARY
processing, the network transport, but also of the scenery,
In this contribution, we introduced a quite complex system
namely, the simulated room (dimension and complexity of
for simulation and auralization of room acoustics in real
the geometry), the velocity of sources, and finally the user.
time. The system is capable of simulating room acoustical
Updating the room acoustical simulation is the most time-
sound fields in any kind of enclosures without the prereq-
consuming part of the system and requires a strategy of
uisite of any diffuse-field conditions. The room shape can
achieving the best perceptual performance. Image sources
hence be extremely long, flat, coupled, or of any other spe-
and ray tracing are processed independently on different
cial property. The surface properties, too, can be freely cho-
CPUs. The binaural filter of the ray tracing process will be
sen by using the amount of wave scattering according to
calculated directly on the ray-tracing server. The auralization
standardized material data. Furthermore, the system includes
server has to calculate the image source filter and combine all
a sound field reproduction for a single user based on dy-
filter segments of the ray-tracing process. Figure 15 describes
namic crosstalk cancellation (virtual headphone). The soft-
one possible segmentation of the ray tracing and combina-
ware is implemented on standard PC hardware and requires
tion of the image source filter. It should be mentioned that
no special processors. The performance (simulation process-
the length of the specular part is room dependent. The ray-
ing time, filter update rates, tracker, and sound hardware la-
tracing interrupt point will be adjusted based on the move-
tency) was evaluated and considered sufficiently in the case
ment velocity of the listener and the sources. This means that
of a concert hall of medium size.
the audio signal is filtered with the updated first part of the
Particular features of the system are the following.
room impulse response while the generation of the late part
by ray tracing is still in progress. The filter segment to be up- (i) It is not based on any assumption of an ideal diffuse
dated will be cut off from the complete filter with a short sound field but on a full room acoustic simulation in
ramp of 32 samples ≈ 0.72 millisecond, and the new seg- two parts. Specular and scattered components of the
ment will be placed in with the same ramp to avoid audible impulse response are treated separately. Any kind of
artifacts. room shape and volume can be processed except small
Due to the dependency of all these factors, update times rooms at low frequencies.
cannot be estimated in general. For this reason, we will give (ii) The decision with regard to the amount of specu-
some detailed examples with respect to the performance lar and diffuse reflections is just room dependent and
measurements (see Tables 4 and 5) made in several sections purely based on physical sound field aspects.
above. It should be noticed that the image source filter will (iii) The user will just be involved to create the room
be updated at any time the source or the head moved more CAD model and the standard material data of ab-
than 2 cm or turned more than 1 degree, respectively. The sorption and scattering. Therefore, import functions
image source filter will be calculated on the current list of of commercial non-real-time simulation software can
audible sources (positions updated). The resulting filter only be used. The fact that the auralization is performed in
Tobias Lentz et al. 15

First part (specular reflections) of the impulse response generated by the IS method
First part (diffuse reflections) of the impulse response generated by the ray tracing method
Late parts (specular and diffuse reflections) of the impulse response
100-200 ms generated by the ray-tracing method

200-500 ms
···

Figure 15: Combination of filter (or filter segments) for one ear generated by ray tracing and the first part of the impulse response generated
by the image source model.

Table 5: Update intervals for different modes and conditions of head or source movements based on the measurements shown in Table 4.

Action Update rate Filter content to be updated


Binaural processing in listeners
Head rotation 35 ms
coordinates
Binaural processing in listeners
Translational coordinates
710 ms
head/source movement > 0.25 m Specular impulse response
(3D image source cloud)

Binaural processing in listeners


coordinates
Translational
Specular impulse response
head/source movement > 1.0 m 3.0 s
(3D image source cloud).
(complete impulse response update)
Scattering impulse response
(3D scattering matrix)

Binaural processing in listeners


coordinates
Fast translational
Specular impulse response
head/source movement > 1.0 m 1.6 s
(3D image source cloud).
(update of the first 500 ms)
Scattering impulse response
(3D scattering matrix).

real time means that the user is not required to carry tion and speed of movement of the user in the VR sys-
out any additional tasks. The system will adjust all rele- tem.
vant runtime parameters automatically and inherently, (vi) The precision of details in the impulse response, its
like division into specular and scattered parts and filter exactness of delays, and its exactness of direction of
update rates. sound incidence are just depending on the relative ar-
(iv) The treatment of the components of the binaural im- rival time in the impulse response. This is in agree-
pulse response is separated regarding the simulation it- ment with the ability of the human hearing system
self, the update rate to the auralization server, and the regarding localization and echo delays. Is should also
convolution process. be mentioned here that the system parameters of
(v) The decision regarding the update rate and depth of simulation depth and update rate are not controlled
impulse response simulation is based on the interac- by the user but inherently treated in the system. This
16 EURASIP Journal on Advances in Signal Processing

way of processing will create full complexity and exact crease their performance, especially with focus on the com-
auralization in the very early part of the direct sound puting of processes in parallel. Position prediction could be a
and the first reflections. Gradually, the sound energy possibility of reducing the deviation of the position, the filter
will be transferred into the scattered component of the was calculated for, and the actual listener’s position.
impulse. The precision and update rates are reduced, Preliminary listening tests showed that the generated vir-
motivated by the limits due to psychoacoustic in mask- tual sources could be localized at a low error-rate [59]. The
ing effects. The system is open for further extension room acoustical simulation was perceived as plausible and
with respect to sound diffraction and sound insula- matching to the generated visual image. In the future, more
tion. tests will be accomplished to evaluate the limitation of the
update rates and the number of sources. Perception based
The real-time performance of the room acoustical simu-
reduction such as stated in, for example, [71, 72] is also an
lation software was achieved by the introduction of a flexi-
interesting method of reducing the processing costs, and will
ble framework for the interactive auralization of virtual en-
be considered in the future.
vironments. The concept of scene graphs for the efficient
and flexible linkage of autonomously operating subscenes by
means of so-called portals has been incorporated into the ex- ACKNOWLEDGMENTS
isting framework and combined with an underlying BSP-tree
structure for processing geometry issues very fast. The use of The authors would like to thank Frank Wefers, Hilmar De-
this framework provides the possibility of a significant reduc- muth, and Philipp Dross for their commitment during parts
tion of computation time for both applied algorithms (de- of the programming work, and also Torsten Kuhlen, Andreas
terministic image sources and a stochastic ray tracer). Espe- Franck, and Mark-Oliver Güld for their support and discus-
cially, the image source method is improved by the introduc- sion. Furthermore, thanks to the DFG for funding parts of
tion of spatial data structures as portal states can be exploited the project (DFG-Project “The Virtual Headphone,” 2004–
so that the number of image sources to be processed can be 2006). Finally, the authors would like to thank the anony-
reduced remarkably. mous reviewers for the extended work which helped a lot to
A fast low latency engine ensures that impulse responses improve this contribution.
regardless of their complete length will be considered by the
filtering of the mono audio material after 5.8 milliseconds
REFERENCES
(block length 256 samples). Optimizations concerning mod-
ern processor extensions enable the rendering of, for exam- [1] D. R. Begault, “Challenges to the successful implementation of
ple, 10 sources with filters of 3-second (132 000 taps) length 3-D sound,” Journal of the Audio Engineering Society, vol. 39,
or 15 sources with filters of 2-second length. no. 11, pp. 864–870, 1991.
The reproduction of the binaural audio signal is provided [2] M. Naef, O. Staadt, and M. Gross, “Spatialized audio render-
by a dynamic crosstalk cancellation system with no restric- ing for immersive virtual environments,” in Proceedings of the
tions to user movements. This system acts as a virtual head- ACM Symposium on Virtual Reality Software and Technology
phone providing the channel separation without the need to (VRST ’02), pp. 65–72, Hong Kong, November 2002.
wear physical headphones. [3] C. Cruz-Neira, D. J. Sandin, T. A. DeFanti, R. V. Kenyon, and
J. C. Hart, “The CAVE: audio visual experience automatic vir-
Gigabit Ethernet is used to connect the visual render-
tual environment,” Communications of the ACM, vol. 35, no. 6,
ing system and the audio system. The visual VR system pp. 65–72, 1992.
transmits the control commands as well as the spatial up- [4] D. A. Burgess and J. C. Verlinden, “An architecture for spatial
dates of the head and the sources. The control commands audio servers,” in Proceedings of Virtual Reality Systems Con-
(e.g., start/stop) will be considered in the audio server after ference (Fall ’93), New York, NY, USA, November 1993.
0.15 millisecond so that the changes are served with the next [5] J. D. Mulder and E. H. Dooijes, “Spatial audio in graphical ap-
sound output block for a tight audio video synchronism. plications,” in Visualization in Scientific Computing, M. Göbel,
H. Müller, and B. Urban, Eds., pp. 215–229, Springer, Wien,
7. OUTLOOK Austria, 1994.
[6] Lake Huron, 2005, http://www.lake.com.au/.
Despite the good performance of the whole system, there are [7] L. Savioja, Modeling Techniques for Virtual Acoustics, Ph.D.
many aspects that have to be investigated. To further enhance thesis, Helsinki University of Technology, Helsinki, Finland,
the quality of the room acoustical simulation, physical effects December 1999.
like sound insulation and diffraction are to be incorporated [8] L. Savioja, J. Huopaniemi, T. Lokki, and R. Väänänen, “Cre-
ating interactive virtual acoustic environments,” Journal of the
into the existing algorithms. In addition, the simulation of
Audio Engineering Society, vol. 47, no. 9, pp. 675–705, 1999.
frequencies below the Schroeder frequency could be done by [9] T. Funkhouser, P. Min, and I. Carlbom, “Real-time acoustic
means of a fast and dynamic finite element method (FEM)- modeling for distributed virtual environments,” in Proceedings
solver. The existing framework is already open to take these of the 26th Annual Conference on Computer Graphics and Inter-
phenomena into account, the respective algorithms have only active Techniques (SIGGRAPH ’99), pp. 365–374, Los Angeles,
to be implemented. At present, the simulation software is im- Calif, USA, August 1999.
plemented in a first version as a self-contained stable base. [10] R. L. Storms, “Npsnet-3D Sound Server: An Effective Use of
Thus, optimizing the algorithms is necessary to further in- the Auditory Channel,” 1995.
Tobias Lentz et al. 17

[11] H. Kuttruff, Room Acoustics, Elsevier Science Publisher, New The Journal of the Acoustical Society of America, vol. 105, no. 4,
York, NY, USA, 4th edition, 2000. pp. 2304–2317, 1999.
[12] J. B. Allen and D. A. Berkley, “Image method for efficiently [29] U. P. Svensson, R. I. Fred, and J. Vanderkooy, “An analytic sec-
simulating small-room acoustics,” The Journal of the Acoustical ondary source model of edge diffraction impulse responses,”
Society of America, vol. 65, no. 4, pp. 943–950, 1979. The Journal of the Acoustical Society of America, vol. 106, no. 5,
[13] J. Borish, “Extension of the image model to arbitrary polyhe- pp. 2331–2344, 1999.
dra,” The Journal of the Acoustical Society of America, vol. 75, [30] N. Tsingos, T. Funkhouser, A. Ngan, and I. Carlbom, “Model-
no. 6, pp. 1827–1836, 1984. ing acoustics in virtual environments using the uniform the-
[14] B.-I. L. Dalenbäck, “Room acoustic prediction based on a uni- ory of diffraction,” in Proceedings of the 28th Annual Confer-
fied treatment of diffuse and specular reflection,” The Journal ence on Computer Graphics and Interactive Techniques (SIG-
of the Acoustical Society of America, vol. 100, no. 2, pp. 899– GRAPH ’01), pp. 545–552, Los Angeles, Calif, USA, August
909, 1996. 2001.
[15] P.-A. Forsberg, “Fully discrete ray tracing,” Applied Acoustics, [31] U. M. Stephenson, Beugungssimulation ohne Rechenzeitexplo-
vol. 18, no. 6, pp. 393–397, 1985. sion: die Methode der quantisierten Pyramidenstrahlen; ein
[16] T. Funkhouser, N. Tsingos, I. Carlbom, et al., “A beam tracing neues Berechnungsverfahren für Raumakustik und Lärmimmis-
method for interactive architectural acoustics,” The Journal of sionsprognose; Vergleiche, Ansätze, Lösungen, Ph.D. thesis,
the Acoustical Society of America, vol. 115, no. 2, pp. 739–756, RWTH Aachen University, Aachen, Germany, 2004.
2004. [32] M. Slater, A. Steed, and Y. Chrysanthou, Computer Graphics
[17] G. M. Naylor, “ODEON—another hybrid room acoustical and Virtual Environments: From Realism to Real-Time, Addi-
model,” Applied Acoustics, vol. 38, no. 2–4, pp. 131–143, 1993. son Wesley, New York, NY, USA, 2001.
[18] U. M. Stephenson, “Quantized pyramidal beam tracing—a [33] L. Cremer and H. A. Müller, Die wissenschaftlichen Grundla-
new algorithm for room acoustics and noise immission prog- gen der Raumakustik—Band 1, S. Hirzel, Stuttgart, Germany,
nosis,” Acta Acustica United with Acustica, vol. 82, no. 3, pp. 2nd edition, 1978.
517–525, 1996. [34] T. Akenine-Möller and E. Haines, Real-Time Rendering, A. K.
[19] D. van Maercke, “Simulation of sound fields in time and fre- Peters, Natick, Mass, USA, 2nd edition, 2002.
quency domain using a geometrical model,” in Proceedings of [35] J. D. Foley, A. van Dam, S. K. Feiner, and J. F. Hughes, Com-
the 12th International Congress on Acoustics (ICA ’86), vol. 2, puter Graphics, Principles and Practice, Addison Wesley, Read-
Toronto, Ontario, Canada, July 1986, paper E11-7. ing, Mass, USA, 2nd edition, 1996.
[20] M. Vorländer, “Simulation of the transient and steady state [36] R. Shumacker, R. Brand, M. Gilliland, and W. Sharp, “Study
sound propagation in rooms using a new combined sound for applying computer-generated images to visual simula-
particle—image source algorithm,” The Journal of the Acous- tions,” Report AFHRL-TR-69-14, U.S. Air Force Human Re-
tical Society of America, vol. 86, pp. 172–178, 1989. sources Laboratory, San Antonio, Tex, USA, 1969.
[21] I. Bork, “A comparison of room simulation software—the 2nd [37] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, In-
round Robin on room acoustical computer simulation,” Acta troduction to Algorithms, MIT Press, Cambridge, Mass, USA,
Acustica United with Acustica, vol. 86, no. 6, pp. 943–956, 2000. 2nd edition, 2001.
[22] M. Vorländer, “International round Robin on room acoustical
[38] D. Schröder and T. Lentz, “Real-time processing of image
computer simulations,” in Proceedings of the 15th International
sources using binary space partitioning,” Journal of the Audio
Congress on Acoustics (ICA ’95), pp. 689–692, Trondheim, Nor-
Engineering Society, vol. 54, no. 7-8, pp. 604–619, 2006.
way, June 1995.
[39] R. Heinz, Entwicklung und Beurteilung von computergestützten
[23] H. Kuttruff, “A simple iteration scheme for the computation of
Methoden zur binauralen Raumsimulation, Ph.D. thesis,
decay constants in enclosures with diffusely reflecting bound-
RWTH Aachen University, Aachen, Germany, 1994.
aries,” The Journal of the Acoustical Society of America, vol. 98,
no. 1, pp. 288–293, 1995. [40] J. S. Bradley and G. A. Soulodre, “The influence of late arriv-
[24] C. L. Christensen and J. H. Rindel, “A new scattering method ing energy on spatial impression,” The Journal of the Acoustical
that combines roughness and diffraction effects,” in Forum Society of America, vol. 97, no. 4, pp. 2263–2271, 1995.
Acousticum, Budapest, Hungary, 2005. [41] J. H. Rindel, “Evaluation of room acoustic qualities and de-
[25] R. Heinz, “Binaural room simulation based on an image fects by use of auralization,” in Proceedings of the 148th Meet-
source model with addition of statistical methods to include ing of the Acoustical Society of America, San Diego, Calif, USA,
the diffuse sound scattering of walls and to predict the rever- November 2004.
berant tail,” Applied Acoustics, vol. 38, no. 2–4, pp. 145–159, [42] D. Schröder, P. Dross, and M. Vorländer, “A fast reverbera-
1993. tion estimator for virtual environments,” in Proceedings of the
[26] Y. W. Lam, “A comparison of three reflection modelling meth- AES 30th International Conference, Saariselkä, Finland, March
ods used in room acoustics computer models,” The Journal 2007.
of the Acoustical Society of America, vol. 100, no. 4, pp. 2181– [43] T. Brookes and C. Treble, “The effect of non-symmetrical
2192, 1996. left/right recording pinnae on the perceived externalisation of
[27] M. Vorländer, “Ein Strahlverfolgungsverfahren zur Berech- binaural recordings,” in Proceedings of the 118th Audio Engi-
nung von Schallfeldern in Räumen,” Acustica, vol. 65, no. 3, neering Society Convention, Barcelona, Spain, May 2005.
pp. 138–148, 1988. [44] D. S. Brungart, W. M. Rabinowitz, and N. I. Durlach, “Audi-
[28] J. S. Suh and P. A. Nelson, “Measurement of transient response tory localization of a nearby point source,” The Journal of the
of rooms and comparison with geometrical acoustic models,” Acoustical Society of America, vol. 100, no. 4, p. 2593, 1996.
18 EURASIP Journal on Advances in Signal Processing

[45] A. Kulkarni and H. S. Colburn, “Role of spectral detail in Workshop, 9th Eurographics Workshop on Virtual Environ-
sound-source localization,” Nature, vol. 396, no. 6713, pp. ments, pp. 189–198, Zurich, Switzerland, May 2003.
747–749, 1998. [62] R. Azuma and G. Bishop, “A frequency-domain analysis of
[46] H. Lehnert and M. Richter, “Auditory virtual environment: head-motion prediction,” in Proceedings of the 22nd Annual
simplified treatment of reflections,” in Proceedings of the 15th Conference on Computer Graphics and Interactive Techniques
International Congress on Acoustics (ICA ’95), Trondheim, (SIGGRAPH ’95), pp. 401–408, ACM Press, Los Angeles, Calif,
Norway, June 1995. USA, August 1995.
[47] G. Romanenko and M. Vorländer, “Employment of spherical [63] L. Chai, W. A. Hoff, and T. Vincent, “Three-dimensional mo-
wave reflection coefficient in room acoustics,” in IoA Sympo- tion and structure estimation using inertial sensors and com-
sium Surface Acoustics, Salford, UK, 2003. puter vision for augmented reality,” Presence: Teleoperators and
[48] C. Cruz-Neira, D. J. Sandin, and T. A. DeFanti, “Surround- Virtual Environments, vol. 11, no. 5, pp. 474–492, 2002.
screen projection-based virtual reality: the design and imple- [64] J.-R. Wu and M. Ouhyoung, “A 3D tracking experiment on la-
mentation of the CAVE,” in Proceedings of the 20th Annual tency and its compensation methods in virtual environments,”
Conference on Computer Graphics and Interactive Techniques in Proceedings of the 8th Annual ACM Symposium on User In-
(SIGGRAPH ’93), pp. 135–142, ACM Press, Anaheim, Calif, terface and Software Technology (UIST ’95), pp. 41–49, ACM
USA, August 1993. Press, Pittsburgh, Pa, USA, November 1995.
[49] B. B. Bauer, “Stereophonic earphones and binaural loudspeak- [65] I. B. Witew, “Spatial variation of lateral measures in different
ers,” Journal of the Audio Engineering Society, vol. 9, no. 2, pp. concert halls,” in Proceedings of the 18th International Congress
148–151, 1961. on Acoustics (ICA ’04), vol. 4, p. 2949, Kyoto, Japan, April 2004.
[50] O. Kirkeby, P. A. Nelson, and H. Hamada, “Local sound [66] R. Azuma and G. Bishop, “Improving static and dynamic reg-
field reproduction using two closely spaced loudspeakers,” The istration in an optical see-through HMD,” in Proceedings of the
Journal of the Acoustical Society of America, vol. 104, no. 4, pp. 21st Annual Conference on Computer Graphics and Interactive
1973–1981, 1998. Techniques (SIGGRAPH ’94), pp. 197–204, ACM Press, New
[51] H. Møller, “Reproduction of artificial-head recordings York, NY, USA, July 1994.
through loudspeakers,” Journal of the Audio Engineering Soci- [67] W. Pompetzki, Psychoakustische Verifikation von Computer-
ety, vol. 37, no. 1-2, pp. 30–33, 1989. modellen zur binauralen Raumsimulation, Ph.D. thesis, Ruhr-
[52] W. G. Gardner, 3-D audio using loudspeakers, Ph.D. the- Universität Bochum, Bochum, Germany, 1993.
sis, Massachusetts Institute of Technology, Cambridge, Mass, [68] M. Vorländer and E. Mommertz, “Definition and measure-
USA, 1997. ment of random-incidence scattering coefficients,” Applied
[53] T. Lentz and O. Schmitz, “Realisation of an adaptive cross-talk Acoustics, vol. 60, no. 2, pp. 187–199, 2000.
cancellation system for a moving listener,” in Proceedings of the [69] ISO 354, “Acoustics, Measurement of sound absorption in a
21st Audio Engineering Society Conference, St. Petersburg, Rus- reverberant room,” 2003.
sia, June 2002. [70] ISO/DIS 17497-1, “Acoustics Measurement of the sound scat-
[54] T. Lentz and G. K. Behler, “Dynamic cross-talk cancellation tering properties of surfaces—part 1: measurement of the ran-
for binaural synthesis in virtual reality environments,” in Pro- domincidence scattering coefficient in a reverberation room”.
ceedings of the 117th Audio Engineering Society Convention, San [71] N. Tsingos, “Scalable perceptual mixing and filtering of audio
Francisco, Calif, USA, October 2004. signals using an augmented spectral representation,” in Pro-
[55] Steinberg, “ASIO 2.0 Audio Streaming Input Output Develop- ceedings of the 8th International Conference on Digital Audio
ment Kit,” 2004. Effects (DAFx ’05), Madrid, Spain, September 2005.
[56] T. Lentz, “Dynamic crosstalk cancellation for binaural synthe- [72] N. Tsingos, E. Gallo, and G. Drettakis, “Perceptual audio ren-
sis in virtual reality environments,” Journal of the Audio Engi- dering of complex virtual environments,” in Proceedings of the
neering Society, vol. 54, no. 4, pp. 283–294, 2006. 31st Annual Conference on Computer Graphics and Interactive
[57] T. Takeuchi, P. Nelson, O. Kirkeby, and H. Hamada, “The Techniques (SIGGRAPH ’04), pp. 249–258, Los Angeles, Calif,
effects of reflections on the performance of virtual acoustic USA, August 2004.
imaging systems,” in Proceedings of the International Sympo-
sium on Active Control of Sound and Vibration (ACTIVE ’97),
pp. 955–966, Budapest, Hungary, August 1997. Tobias Lentz was born in Mönchenglad-
bach, Germany, in 1971. He studied elec-
[58] D. B. Ward, “On the performance of acoustic crosstalk cancel-
trical engineering at RWTH Aachen, Ger-
lation in a reverberant environment,” The Journal of the Acous-
many, from where he received a Dipl.-Ing.
tical Society of America, vol. 110, no. 2, pp. 1195–1198, 2001.
(M.Sc.) degree in 2001. Since 2001 he has
[59] T. Lentz, J. Sokoll, and I. Assenmacher, “Performance of spatial been working as a Research Assistant and
audio using dynamic cross-talk cancellation,” in Proceedings of is currently a Ph.D. candidate at the Insti-
the 119th Audio Engineering Society Convention, New York, NY, tute of Technical Acoustics, RWTH Aachen
USA, October 2005. University. His main focus is on three-
[60] W. G. Gardner, “Efficient convolution without input-output dimensional audio technologies, architec-
delay,” Journal of the Audio Engineering Society, vol. 43, no. 3, tural acoustics, crosstalk cancellation, binaural technology, and
pp. 127–136, 1995. real-time applications for virtual reality. Currently, he is finishing
[61] J. J. La Viola Jr., “A testbed for studying and choosing predic- his Ph.D. thesis on “Binaural Technology for Virtual Reality.” He is
tive tracking algorithms in virtual environments,” in Proceed- a Member of the Audio Engineering Society (AES) and the German
ings of the 7th International Immersive Projection Technologies Acoustical Association (DEGA).
Tobias Lentz et al. 19

Dirk Schröder was born in Cologne, Ger-


many, in 1974. He studied electrical en-
gineering and information technology at
RWTH Aachen University, Germany, and
received a degree of Dipl.-Ing. (M.Sc.) in
2004. He has been working at the Institute
of Technical Acoustics, RWTH Aachen Uni-
versity, as a Research Assistant since 2005
and is currently a Ph.D. candidate at RWTH
Aachen University. His main research field
is room acoustic simulation with special focus on interactive real-
time applications such as virtual reality. He is a Member of the Au-
dio Engineering Society (AES) and the German Acoustical Associ-
ation (DEGA).

Michael Vorländer is a Professor at RWTH


Aachen University, Germany. After univer-
sity education in physics and doctor de-
gree (Aachen, 1989 with a thesis in room
acoustical computer simulation), he worked
in various fields of acoustics at the PTB
Braunschweig, the National Laboratory for
Physics and Technology. In 1995 he finished
the qualification as university lecturer (ha-
bilitation) with a thesis on reciprocity cali-
bration of microphones. In 1996 he accepted an offer from RWTH
Aachen University for a Chair and Director of the Institute of Tech-
nical Acoustics. He is President of the European Acoustics Asso-
ciation, EAA, in the term 2004–2007 and former Editor-in-Chief
of the International Journal Acta Acustica united with Acustica
(1998–2003). He is a Member of the German Acoustical Society,
DEGA, of the German Physical Society, DPG, and a Fellow of the
Acoustical Society of America, ASA.

Ingo Assenmacher was born in Düren, Ger-


many, in 1974. He studied computer science
at RWTH Aachen University, Aachen, and
received a degree of Dipl.-Inform. (M.Sc.)
degree in 2002. He is currently working at
the Center for Computation and Commu-
nication, RWTH Aachen University, as a Re-
search Assistant and is a Ph.D. candidate
at RWTH Aachen University. His main re-
search fields are interaction in immersive
Virtual Environments, software methods for real-time environ-
ments and virtual-reality-based data visualization and exploration.

You might also like