SELD-TCN：通过时间卷积网络进行的声音事件定位和检测

论文标题

SELD-TCN：通过时间卷积网络进行的声音事件定位和检测

SELD-TCN: Sound Event Localization & Detection via Temporal Convolutional Networks

论文作者

Guirguis, Karim, Schorn, Christoph, Guntoro, Andre, Abdulatif, Sherif, Yang, Bin

论文摘要

对周围环境的理解在自主机器人系统（例如自动驾驶汽车）中起着至关重要的作用。有关视觉感知的广泛研究。但是，为了获得对环境的更完整的认识，未来的自主系统也应考虑到声学信息。最近的声音事件定位和检测（SELD）框架利用卷积复发性神经网络（CRNN）。但是，考虑到CRNN的复发性，在嵌入式硬件上有效实施它们变得具有挑战性。他们的计算不仅与并行化，而且还需要高内存带宽和较大的存储器缓冲区。在这项工作中，我们基于时间卷积网络（TCN）开发了一种更健壮和硬件友好的新颖体系结构。所提出的框架（SELD-TCN）的表现优于四个不同数据集上的最先进的SeldNet性能。此外，SELD-TCN在普通图形处理单元（GPU）上每年的训练时间更快4倍，而推理时间更快。

The understanding of the surrounding environment plays a critical role in autonomous robotic systems, such as self-driving cars. Extensive research has been carried out concerning visual perception. Yet, to obtain a more complete perception of the environment, autonomous systems of the future should also take acoustic information into account. Recent sound event localization and detection (SELD) frameworks utilize convolutional recurrent neural networks (CRNNs). However, considering the recurrent nature of CRNNs, it becomes challenging to implement them efficiently on embedded hardware. Not only are their computations strenuous to parallelize, but they also require high memory bandwidth and large memory buffers. In this work, we develop a more robust and hardware-friendly novel architecture based on a temporal convolutional network(TCN). The proposed framework (SELD-TCN) outperforms the state-of-the-art SELDnet performance on four different datasets. Moreover, SELD-TCN achieves 4x faster training time per epoch and 40x faster inference time on an ordinary graphics processing unit (GPU).

下载PDF全文

下载文献需遵守相关版权规定

论文标题