3D Sound Source Localization Using A Spherical Mic
3D Sound Source Localization Using A Spherical Mic
net/publication/326126874
CITATIONS READS
13 322
2 authors:
All content following this page was uploaded by Manuel Brandner on 02 July 2018.
Abstract
Directional detection of sound sources under defined ambience conditions using a spherical microphone array
(Eigenmike) is examined. The used spatial detection algorithm correlates synthesized spherical wave spectra derived
from theory with a set of concrete spherical spectra calculated from measured impulse responses. Thus, measurement
signals were recorded with the 32 microphone equipped Eigenmike microphone array and two measurement sets were
created for spatial sampling positions along an enclosing spherical surface with different radial distances. In order
to simulate free field conditions and to compare the applicability of the proposed algorithm under real conditions the
derived impulse responses are windowed adequately. Based on the Fourier transform of these resulting responses
the calculation of the spherical wave spectra for specific source positions is possible. Under free-field conditions,
the calculation of the synthesized spherical wave spectra of various spatial positions only depends on the structural
properties of the microphone array and the position of the measured omnidirectional sound source. Correlation of
measured and synthesized spherical wave spectra results in a data set with a maximum value for the sought direction
of the sound source. Another aim of investigation is to understand the context between the size of the synthesized
data - which serves as a lookup-table - and the directional accuracy. Within the contribution valuable information
about functionality as well as the boundaries of the directional detection under defined spatial conditions with the
spherical microphone array Eigenmike is given. The results show a limited frequency resolution as expected, due to
the arrangement of the microphones on the sphere. The measurement is carried out at 612 equiangular source positions
for the first measurement set and at 480 source positions for the second one. For the ideal case and a synthesis matrix
including all measured source positions the algorithm yields for the full frequency range from 172 Hz up to the aliasing
frequency (falias = 5.2kHz) an diminishing median deviation error. The accuracy is directly connected to the spatial
sampling (microphone capsule spacing) and the size of the synthesis matrix.
3
Proceedings of ICSA 2014 in Erlangen --- ISBN 978-3-98 12830-4-4
search of the correlation function obtained via correlation of a calculation of the coefficients cnm (cf.(7)). The derivation
set of analytically derived wave spectra with measured wave
spectra. i ′ ′
jn (krm )bnm + hn (krm )cnm = 0 (7)
ρ0 c
4
Proceedings of ICSA 2014 in Erlangen --- ISBN 978-3-98 12830-4-4
Anm = pY−1
nm (15) 2.4. Examining the ASL Algorithm
To prevent that the matrix inversion yields a singular solution, Since an analytically feasible acoustic source localization
the Ynm -matrix can be inverted via a pseudo inverse: algorithm for arbitrary source direction is found, the next
step is to examine the feasibility for different scenarios. The
Anm = (YT Y)−1 YT p (16) first empirical evaluation of the algorithm is made with a
measurement setup fulfilling the free field condition. Impulse
Nevertheless, adequate sampling is necessary to yield a con- responses with an exponential sweep are calculated and ade-
dition number as small as possible. The configuration of quate windowing with a window with 20 samples fade in and
the capsules on the sphere of the Eigenmike yield such a 80 samples fade out prevent spatial influences.
small condition number. Equation (16) results in a (N + 1)2 For further investigations the behavior of the algorithm by
long vector and corresponds to the spherical wave spectrum reducing the synthesis matrix - reducing the sets grid of
calculated from the measured sound pressure distribution. possible source locations - to a minimum is examined. After
minimizing the synthesis matrix, which holds the possible
2.3. Matching profile of Spherical Wave Spectra source locations, an vector interpolation approach is made.
In theory, the three strongest components of the matching
The feasibility of the algorithm is examined by the autocor- profile should hold the most valuable directional information.
relation of the measured wave spectra. For this case the Therefore, these components are summed up and weighted to
angular deviation is zero. Further investigations are made gain an improvement for small synthesis sets.
on examining the correlation of a synthesized set of wave Another task is examining the robustness of the algorithm for
spectra (calculated analytically) with a measured spectrum different window lengths, which is directly connected to a
for a specific source position. To get to an matching profile change of the D/R - ratio (ratio of the direct sound field to
(correlation vector) we first have to calculate the correlation reverberant sound field) and similar to changing the distance
coefficient of the Anm -vector and the Anm -matrix [5]. This to the acoustic sound source.
synthesis matrix (Anm -matrix) holds all angle dependent Depending on the radial distance of a sound source to the
source positions, so the correlation coefficient has to be microphone array, a radial filter is necessary to focus the
calculated for all angle pairs {ϕl , ϑl }l=1...L . The result of localization algorithm onto the source. These dependencies
each correlation will be stated as in (17), where Anm,s is a are also subject of interest. Investigations on the performance
(NxM)-Matrix (N source positions and M sampling points) are also in relation to the structural characteristics of the
and denotes the synthesized spectrum. Anm is a vector of microphone array and therefor the limits of the aperture have
length M and denotes the measured spectra. The highest to be taken into account.
correlation value of the AnmM AT CH -vector for a successful
source localization includes the sought angle pair (seen in
Figure 1). 3. Measurement Setup
Both measurements were carried out at the Institute of Elec-
AH nm,s Anm
AnmM AT CH = q (17) tronic Music and Acoustics (IEM) in the ”IEM-CUBE” in
AH H
nm,s Anm,s Anm Anm Graz. The CUBE is an 10.3 x 12 x 4.8 m large room used
as a lab, for concerts and also lectures (reverberation time
RT60 ≈ 0.7s, rH = 1.9m). For the first setup the task was to
minimize the room influences and gain free field conditions.
1
N=4, complex Therefore a radius at rq = 0.7m was chosen and a loudspeaker
0.9 N=4, abs with nearly spherical radiation, one acoustic center and linear
0.8
frequency response was used (see Figure 2, Figure 3 shows
Magnitude − Correlation vector
5
Proceedings of ICSA 2014 in Erlangen --- ISBN 978-3-98 12830-4-4
4. Performance Evaluation
In Figure 4 we can see the overall performance of the
proposed ASL algorithm, of course for the ideal case, where
the free field condition is fulfilled. In this ideal case the
algorithm still works beyond the aliasing frequency, because
only little dynamic for a correct localization is necessary - no
interferences are predominant. The same phenomenon occurs
for very low frequencies. The matching vector has a low
dynamic, but still enough to separate the sought direction.
In case of low dynamics we included the phase term which
should yield an improvement in the source localization. Due
to the fact that the for low frequencies the amplitudes of
adjacent capsules are a minimum an improvement is desired.
Figure 2: Measurement setup, left: measurement sketch, right: The empirical evaluation shows that including the phase
image of tracking and positioning the loudspeaker
term also leads to an improvement of the algorithm for the
rest of the interested frequency range (cf. Figure 1 and
Figure 7). For the investigation of harsher conditions a second
measurement set is evaluated. The radial distance to the
source positions is more than double the distance for the first
arrangement. This should enable the comparison of the ideal
case with a case at a source radius with higher room influences
and give an overview of the performance of the algorithm (see
Sec.4.4).
40
falias
35
30
angular deviation [°]
25
20
6
Proceedings of ICSA 2014 in Erlangen --- ISBN 978-3-98 12830-4-4
almost lies at the half of the beam width. Because the 4.2. Dependence of Spherical Harmonics Order
arrangement of the microphones does not correspond to a and Phase Information
platonic solid, a deviation of the mean value away from half
of the beamwidth can be expected. The deviation might be As we can see in Figure 1, including the phase information
approximated via the ratio of the order to the number of actual of the recorded signals yields a much higher dynamic. The
microphone capsules (µapprox = 14.6◦ ). phase information lies in the time difference of arrival at
the capsules. The impulse responses are always calculated
187◦ relatively to each other. In the case of the matching profile for
γbeam = = 37.4◦ . (21) 4 kHz we see a dynamic four times higher than without the
N +1
complex matching approach.
If the synthesis matrix is thinned out to 49 points (γbeam = If we have a look at the spherical harmonics order, we can
26.7◦ ) the median deviation is reduced to 11◦ (cf. Figure 6). see a reduction of ambiguity for higher order calculations
For the ideal case the algorithm is feasible beyond the alias in the matching profile which can not only be seen as an
frequency as long as the magnitudes of the spatial aliasing enhancement of dynamics, but also as an improvement for
beams are smaller than the one in the sought direction. We robust vector interpolation (cf. Figure 7). As already stated
can state that at least the reduction of the synthesis grid is an increase of accuracy for low frequencies is also given. The
feasible and leads also to a reduction of computational costs. higher the order, the smaller the side lobes of the spherical
Nevertheless the intrinsic error increases with reduction of the beam, which means also less ambiguities.
lookup-table.
35
falias 1
N=4, complex
30 0.9 N=1, complex
0.8
Magnitude − Correlation vector
25
angular deviation [°]
0.7
20
0.6
15 0.5
0.4
10
0.3
5
0.2
0 0.1
172 1378 2584 3790 4996
frequency in Hz 0
[0°, 0°] [0°, 40°] [0°, 80°] [0°,120°] [0°,160°]
Source directions (phi, theta)
Figure 5: boxplot of the angular deviation for all source positions
over frequency for 25pt synthesis matrix
Figure 7: Matching profile AnmM AT CH for 4kHz over all source
positions - red: N=1, blue dashed: N=4, sought direction at
phi=0◦ ,theta=90◦
40
35
7
Proceedings of ICSA 2014 in Erlangen --- ISBN 978-3-98 12830-4-4
40 40 30
median(single loc value deviation angle measurement 1 28
35 median(vec interp) deviation angle measurement 2 26
median(vector interp norm) D/R−ratio measurement 1 24
f 30 D/R−ratio measurement 2
20
D/R−ratio in dB
25 18
20 16
20 14
12
15 10
10 8
10 6
4
5 2
0 0
128 256 512 1024 2048 4096 8192 16384
0
samples
172 1378 2584 3790 4996 6202 7407 8613
frequency in Hz
Figure 9: direct sound to reverberant sound ratio versus the window
length. For the first measurement set a radial distance of rq =
Figure 8: median angular deviation for all source positions
0.7m toward the acoustic sources was chosen. The course of the
over frequency for 25 point synthesis set in comparison of two
D/R ratio for the first set shows only a 16dB range due to the
vector interpolation approaches. The normalized approach is only
measurement arrangement. Because examining the algorithm in
a vectorially summation of the three highest correlation values,
a free field situation the room influences are a minimum. For
whereas the second interpolation approach uses corresponding
the second measurement set less damping material is used and the
correlation dependent weighting
distance of the radial arranged sources is almost the critical distance,
rH = 1.9m. Higher window lengths are in direct relation with a
4.4. D/R-Ratio - Direct-to-Reverberant Sound higher amount of room influences which is still visible in the plot.
To have a glance at a real performance situation and the
impact on the accuracy of the algorithm, different window radialfilter, which focuses the spectrum at a specific source
lengths were considered. The ratio of the direct sound distance. The higher the order of the spherical harmonics, the
field to the diffuse sound field was chosen as an adequate higher the amplification of the associated frequency compo-
parameter (cf.(22)) to evaluate every source position based nent gets (cf. Figure 10). If we consider different focusing
on a the predominant sound field and not only in relation (also wrong focusing) for one and the same source position, it
to the source distance. The window length of the direct is possible to examine the impact on the source localization.
sound was chosen with 128 samples, based on an evaluation The error angle is in direct relation with focusing on a source
of the impulse responses. Figure 9 depicts that the longer position via the radial filter (cf. Figure 11). Problems occur
the overall window length is chosen, the higher the deviation if the algorithm focuses on a source distance closer than the
error gets and the D/R-ratio decreases. It can be seen as actual radius. If further away focused, no increase in the
similar to changing the distance toward the source position. angular deviation is observed. As a limitation on focusing
If we increase the distance between the microphone array and on a source the amplification of the radial filter has to be
the acoustical source, the influences of disturbance increase, considered. It is not feasible to amplify a signal more than
therefore the D/R-ratio decreases. The evaluation was made 60dB due to the dynamic range of the microphone array.
for both measurement sets (measurement set 1, rq = 0.7m Therefor a maximum distance in regarding of the radial filter
and measurement set 2, rq = 1.5m). As we can see in limits the focus.
Figure 9 the increase of the distance yields a larger angle
deviation for higher window lengths. Only for short windows
(128 and 256 samples) very high accuracy with the ASL
5. Conclusion
algorithm is possible to achieve. Due to the fact that for the First of all, the evaluation showed the feasibility of the
second measurement set the source radius was almost at the algorithm for free field conditions. For this specific conditions
critical distance the angle error increases much faster. The the algorithm yields in the whole frequency range of interest
influences of less damping material on the floor for the second (up to 5.2kHz) excellent results. Especially including the
measurement can not be neglected. phase term in the source localization enhances the dynamic
range extremely. The approach of using a higher order
10log( s2dir )
P
D/R = . (22) ambisonics system yields much more information about the
10log( s2rev )
P
source and direction than a conventional approach. Second
of all, is shown that it is possible to reduce the synthesis
grid in order to cut back computational costs. Nevertheless,
4.5. Dependence of the Radial Filter
an increase of the intrinsic deviation error is the result. If
In Section 2.1 a method to calculate a synthesized spectrum the synthesized matrix is cropped to a minimum, a vector
was treated. Equation (12) consists of two multiplicands, interpolation yields a very decent enhancement, depending
whereas the second one states the spherical harmonics at on the frequency, of up to 10 degree. Of course the best
specific source position. The first fractional term states the result in respect of the deviation error angle yields the source
8
Proceedings of ICSA 2014 in Erlangen --- ISBN 978-3-98 12830-4-4
160
N=0 (rq=0.7) References
N=1 (rq=0.7)
140
N=2 (rq=0.7) [1] Tervo, S.: Direction estimation based on sound intensity
N=3 (rq=0.7)
120 N=4 (rq=0.7)
vectors. 17th European Signal Processing Conference,
N=0 (rq=1.5) Scotland (2009), 700-704
100 N=1 (rq=1.5)
magnitude in dB
falias
25
20
15
10
0
172 775 1378 1981 2584 3187 3790 4393 4996 5599
frequency in Hz
9
View publication stats