DOA
$ N7 S, g- O7 D 声源定位方法一般可分为三类,一种是基于TDOA的两步算法(two-stage algorithm),一种是基于空间谱估计如MUSIC等,还有就是基于beamforming的方法,也就是这里要介绍的可控波束响应(steered-response power)," ^ K8 n) m4 m% \3 M% |0 K; J
% X' O% M. g, M1 X" e1 y* }
steered-response power% p' G( s" O' r ^
可控波束响应是利用波束形成(beamforming)的方法,对空间不同方向的声音进行增强,得到声音信号最强的方向就被认为是声源的方向。 ) C' h& ^ ~) C2 H% o, E
上一篇中简单介绍了麦克风阵列的背景知识,最简单的SRP就是利用延时-累加(delay-and-sum)的方法,寻找输出能量最大的方向。 9 V# T4 d9 D% P7 }: S0 x2 A7 O O
其中,语音信号为宽带信号,因此需要做宽带波束形成,这里我们在频域实现$ E; ~4 Z# L$ }, _. ]3 d
. g9 o7 j3 {2 M: i5 |+ Q" ?+ j( ^
频域宽带波束形成9 k8 ^3 q+ J8 {$ v# @$ t% J' @" s
频域宽带波束形成可以归类为DFT波束形成器,结构如下图
# S8 Z5 l) F2 @3 a0 T. R & j* s9 }" W3 [* h, E% G5 z* J4 |
8 a3 _+ N; \2 y频域处理也可以看做是子带处理(subband),DFT和IDFT的系数分别对应子带处理中的分析综合滤波器组,关于这一种解释,可参考《传感器阵列波束优化设计与应用》第六章。 频域宽带延时累加波束形成的基本过程就是信号分帧加窗->DFT->各频点相位补偿->IDFT 4 v: o' b! Z" }' o
代码实现如下 0 s* F0 w3 Q4 w& E3 M# Q
nction [ DS, x1] = DelaySumURA( x,fs,N,frameLength,inc,r,angle)
: [ L) P c, v; Z X; L%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%' ]3 N7 {) i, {: D" R
%frequency-domain delay-sum beamformer using circular array2 |# U1 k4 w: ^/ }" ~' ^$ U
% $ ~( u: H6 Z" n7 H4 D
% input :
! s& N# v1 K( w# [" _% x : input signal ,samples * channel
6 m- K( F; P( v$ j# b% fs: sample rate
6 B& r e$ c8 Y7 }9 I) i1 T/ D0 a% N : fft length,frequency bin number$ L' ?, l& G0 \
%frameLength : frame length,usually same as N
$ b6 B [- M4 T3 @; K% inc : step increment- y O) a Y" t+ j! B& }$ h# t
% r : array element radius
3 n' S3 l) w5 Z' v) T, z5 V% angle : incident angle
" u1 B/ z. ~$ a& h# ^& D, c4 {%6 W6 ]$ O' q4 Q- v
% output :
6 |" `; D; Z6 ~( a5 ]/ E% DS : delay-sum output
1 L7 W; M& _0 _3 W% x1 : presteered signal,same size as x# }) A8 C j* o m. _6 F
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2 t( N5 u4 T( h7 `3 V
! B3 u- H( z9 F# B2 C- f; E5 wc = 340;0 k# K2 W; U3 F$ P
Nele = size(x,2);" ]' H0 P+ I+ P8 R2 a, m( L8 `
omega = zeros(frameLength,1);7 [+ _9 t& F( c+ o
H = ones(N/2+1,Nele);
7 v8 T( y, \/ N( T2 H
; m0 Y, K4 D( T! N& U$ q4 Itheta = 90*pi/180; %固定一个俯仰角
6 F1 ^+ K! w% o. B$ ]3 \3 ^gamma = [30 90 150 210 270 330]*pi/180;%麦克风位置1 w4 \+ l8 F2 [% l4 G! L
tao = r*sin(theta)*cos(angle(1)-gamma)/c; %方位角 0 < angle <360
6 z z' \2 u0 N2 a$ q' k* I4 n# gyds = zeros(length(x(:,1)),1);
- T9 X, }& Y* n$ jx1 = zeros(size(x));
e0 i$ Z0 N: `7 G7 |' r# Q& D6 W
6 L* o& w7 W4 u5 E: A8 ^6 A% frequency bin weights
8 b. b8 @- }+ {; X; U+ u% for k = 2:1:N/2+1
3 D- b$ X/ C4 ^4 Z( I& q8 T/ mfor k = 1:1:5000*N/fs
& T- @8 R0 p/ V& S omega(k) = 2*pi*(k-1)*fs/N; ' @: [6 z% z h, h& D
% steering vector
" W7 O- I. i2 i1 N! u9 r H(k, = exp(-1j*omega(k)*tao);
6 [% c2 f: u- ^ ^, c" y; X7 R; y( x/ cend
( J. R: L4 c* l: R: \0 W% e0 g! }/ T0 Z& `/ } f8 ^
for i = 1:inc:length(x(:,1))-frameLength8 b: h% s) n8 g1 Q( S, T$ w, N: H9 _1 g5 o
# Z$ B( t$ M3 L, K
d = fft(bsxfun(@times, x(i:i+frameLength-1, ,hamming(frameLength)));0 ]$ u' Z1 z) D3 l: A
+ ^7 z% K# H$ K4 E& m- A2 \
x_fft=bsxfun(@times, d(1:N/2+1, ,H);
4 K# I# K z/ g% C" {2 }
# d7 @, Z5 |4 u( j % phase transformed9 X, N% j# s7 d
%x_fft = bsxfun(@rdivide, x_fft,abs(d(1:N/2+1, ));
7 P1 G" N/ }! o+ Z0 h* O yf = sum(x_fft,2);
" p1 e& n" V4 _, \; U( O6 O Cf = [yf;conj(flipud(yf(2:N/2)))];- ^. @( f7 b' P- n6 g
$ F7 j6 r3 V: I: H/ M6 m `1 O6 W& R
% 恢复延时累加的信号
7 A1 \& m) f# c1 Q yds(i:i+frameLength-1) = yds(i:i+frameLength-1)+(ifft(Cf));
8 e& _/ `3 A8 e) G/ P6 ^4 _
w9 c: |" D u# j ~* L % 恢复各路对齐后的信号5 P9 o8 u: X2 w+ a
xf = [x_fft;conj(flipud(x_fft(2:N/2, ))];; b1 j9 r' V% W) P4 p
x1(i:i+frameLength-1, = x1(i:i+frameLength-1, +(ifft(xf));) S/ r2 `( V0 w4 j
end
6 ~4 R* ]( Q6 N2 g4 c, EDS = yds/Nele; 9 f; ?1 @7 O. u- S
) G+ ~ L7 f w* N, Z, M1 M
end
( U( i' @4 m; q) G" {然后遍历各个角度重复调用这个函数,测试实际录音数据,代码如下2 p, F+ C& o& f$ U* I5 f
# \( Z: h8 m4 X( p5 Z7 }8 `
%% SRP Estimate of Direction of Arrival at Microphone Array, ~: D' C( Z/ L5 H7 u. ]2 g) L) r
% Frequency-domain delay-and-sum test, ]: Z1 Q' l1 D) P M% H3 H% P
%
$ n0 X6 i" ^0 Z%%
! g+ H5 ?9 b$ R% V6 T' _: L( y# N' T
% x = filter(Num,1,x0);
' ~1 d2 Z* `/ ]# O- r5 yc = 340.0;
8 ]8 n! P4 H% J6 I, C" w& b
5 L* m; x O- b8 s9 G% XMOS circular microphone array radius
2 ]( n% a( L" J& r e: kd = 0.0420;. `& k. ]. } n# S; ~9 z
% more test audio file in ../../TestAudio/ folder1 ~) l, p# X) ^+ `2 s3 B$ t7 d
path = '../../TestAudio/XMOS/room_mic5-2/';$ d( I7 r1 V; x) r% @
[s1,fs] = audioread([path,'音轨-2.wav']);
u; U. S0 |! \/ K/ ~s2 = audioread([path,'音轨-3.wav']);
! U; N' u( K$ F2 s, W, j" xs3 = audioread([path,'音轨-4.wav']);
8 L! p" m7 N6 ^. ?s4 = audioread([path,'音轨-5.wav']);7 d, D5 R: Y- q0 G( ^* N% w
s5 = audioread([path,'音轨-6.wav']);4 Q% m/ n! j7 L) {1 c+ _% m
s6 = audioread([path,'音轨-7.wav']);2 y t' L! q: Z2 U0 _
signal = [s1,s2,s3,s4,s5,s6];
) F: K( |; L* S5 Q- ~% j+ G- c0 KM = size(signal,2);' W" K% J3 W( a8 d
%%+ K: ?; p5 g4 `; K) n/ D: s
t = 0;% G3 C2 P# k( V' }1 Y
3 ]* F. E1 B& O5 G; @; x5 t \
% minimal searching grid
i3 [% ~2 {6 @! W+ Nstep = 1;
: {: E- |5 L. G( p1 G$ y
& D7 ?% v6 ]* T' pP = zeros(1,length(0:step:360-step)); V6 |, U+ }$ K# C/ H# D4 M
tic
$ p8 }7 s5 T: S; sh = waitbar(0,'Please wait...');4 \0 w' s* e7 g
for i = 0:step:360-step4 Y6 [' n2 p* Z) ]) w' Y e. }
% Delay-and-sum beamforming
: y" o4 B6 ]/ B, O& u' @1 ? [ DS, x1] = DelaySumURA(signal,fs,512,512,256,d,i/180*pi);2 K+ Y5 h+ i% j! ?8 E9 d {) c2 ^
t = t+1;
. f5 q" O* G% {/ K %beamformed output energy) K- U: |) n2 ]: {
P(t) = DS'*DS;
2 g0 v5 ~! |& b9 ^. Y- t, n. ?5 O waitbar(i / length(step:360-step))" d8 F2 V% {* l0 a6 ~; ]0 p8 d6 C
end& a5 N1 d# o- P, c( X% O
toc1 E' x# n* n1 A
close(h)
/ o* [" r5 }3 B[m,index] = max(P);* A* f T/ }: F: D+ x& `
figure,plot(0:step:360-step,P/max(P))0 v5 k4 r D! D9 k
ang = (index)*step
' ~: n. a* P! w- }
: h; n- n h4 x- q U9 l$ f程序中用的是圆阵,可以进行二维方向角扫描,不过这里为了简便就固定了俯仰角,只扫描方位角,结果如下
. E* \8 ?, ]; k 9 }6 n4 u6 r: i2 i8 F9 x6 T+ y* b
结果与预期相同 PHAT加权 与GCC-PHAT方法相同,这里也可以对幅度做归一化,只保留相位信息,使得到的峰值更明显,提高在噪声及混响环境下的性能 7 j* R4 }" }$ ]% \/ k5 O: N
上面代码中加上这一句
7 i7 I6 w- D- L, x% `7 h%x_fft = bsxfun(@rdivide, x_fft,abs(d(1:N/2+1, ));
# u2 J5 z# S0 j( N+ x测试同样的文件,结果如下
9 e' L- Y) [9 x6 Q$ S+ }( v; e4 g% n7 B 1 W/ B6 K! c* G% a$ r5 p
对比可以看到,PHAT加权的方法性能更好
5 C ^3 `3 U l参考3 |! _ q3 a4 M+ o" {
1.《SRP-PHAT-A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays》 # {+ O) g! y1 ^. r! R3 w
2. 《传感器阵列波束优化设计与应用》
; s% t/ W! g0 n9 n8 X _; ~, G————————————————/ i8 I# ~! U5 _- J/ b
版权声明:本文为CSDN博主「373955482」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
$ a6 ~/ k! p+ ~ r Q3 D9 B原文链接:https://blog.csdn.net/u010592995/article/details/815865045 o5 T( b' c# B
& J4 o% s$ x3 i5 O- a
- m i1 Y/ M) J1 C8 c/ I/ r, `% C6 d% ~9 C* J
3 g& p( k7 z' `7 T/ y2 W0 {- _( ?/ `$ V) Q* ]+ s7 s( L- X
|