论文标题
基于直接路径相对传递功能的机器人头的回响声音定位
Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function
论文作者
论文摘要
本文用机器人头解决了声音源本地化(SSL)的问题,这在现实世界环境中仍然是一个挑战。特别是我们有兴趣定位语音源,因为它们对人类机器人的互动非常感兴趣。对应于直接路径声音传播的麦克风对响应是源方向的函数。实际上,这种反应受到噪音和混响的污染。直接路径的相对传递函数(DP-RTF)定义为两个麦克风的直接path声传递函数(ATF)之间的比率,这是SSL的重要特征。我们提出了一种在短时傅立叶变换(STFT)域中从嘈杂和回响信号中估算DP-RTF的方法。首先,采用了复杂的传输函数(CTF)近似值,以准确表示麦克风阵列的脉冲响应,并且CTF的第一个系数主要由直接路径ATF组成。在每个频率下,框架的语音自动和交叉动力光谱密度(PSD)是通过频谱减法获得的。然后,一组线性方程是由多个帧的语音自动和交叉PSD构造的,其中DP-RTF是未知的变量,并通过求解方程来估计。最后,估计的DP-RTF在频率上串联并用作SSL的特征向量。放置在各种混响环境中的机器人实验表明,所提出的方法的表现优于两种最新方法。
This paper addresses the problem of sound-source localization (SSL) with a robot head, which remains a challenge in real-world environments. In particular we are interested in locating speech sources, as they are of high interest for human-robot interaction. The microphone-pair response corresponding to the direct-path sound propagation is a function of the source direction. In practice, this response is contaminated by noise and reverberations. The direct-path relative transfer function (DP-RTF) is defined as the ratio between the direct-path acoustic transfer function (ATF) of the two microphones, and it is an important feature for SSL. We propose a method to estimate the DP-RTF from noisy and reverberant signals in the short-time Fourier transform (STFT) domain. First, the convolutive transfer function (CTF) approximation is adopted to accurately represent the impulse response of the microphone array, and the first coefficient of the CTF is mainly composed of the direct-path ATF. At each frequency, the frame-wise speech auto- and cross-power spectral density (PSD) are obtained by spectral subtraction. Then a set of linear equations is constructed by the speech auto- and cross-PSD of multiple frames, in which the DP-RTF is an unknown variable, and is estimated by solving the equations. Finally, the estimated DP-RTFs are concatenated across frequencies and used as a feature vector for SSL. Experiments with a robot, placed in various reverberant environments, show that the proposed method outperforms two state-of-the-art methods.