BIFSMN：用于关键字斑点的二进制神经网络

论文标题

BIFSMN：用于关键字斑点的二进制神经网络

BiFSMN: Binary Neural Network for Keyword Spotting

论文作者

Qin, Haotong, Ma, Xudong, Ding, Yifu, Li, Xiaoyang, Zhang, Yang, Tian, Yao, Ma, Zejun, Luo, Jie, Liu, Xianglong

论文摘要

深处神经网络（例如Deep-FSMN）已被广泛研究用于关键字发现（KWS）应用。但是，这些网络的计算资源通常会受到重大限制，因为它们通常在边缘设备上进行通话。在本文中，我们提出了BIFSMN，这是KWS的准确且极高的二元神经网络。我们首先为双向意识训练构建了高频增强蒸馏方案，该方案强调了完全precision网络的表示的高频信息，这对于对二进制网络的优化更为重要。然后，为了在运行时允许即时和自适应的准确性效率折衷，我们还提出了一个可稀薄的二进制架构，以进一步从拓扑的角度来实现二进制网络的加速潜力。此外，我们在ARMV8设备上为BIFSMN实施了快速的位计算内核，该内核充分利用了寄存器并增加了指令吞吐量以突出部署效率的极限。广泛的实验表明，BIFSMN通过说服各种数据集的利润率优于现有的二进制方法，甚至与完整精确的对应物相媲美（例如，语音命令v1-12下降少于3％）。我们强调，BIFSMN受益于稀薄的体系结构和优化的1位实现，可以在现实世界中的Edge硬件上实现令人印象深刻的22.3倍加速器和15.5倍的存储空间。我们的代码在https://github.com/htqin/bifsmn上发布。

The deep neural networks, such as the Deep-FSMN, have been widely studied for keyword spotting (KWS) applications. However, computational resources for these networks are significantly constrained since they usually run on-call on edge devices. In this paper, we present BiFSMN, an accurate and extreme-efficient binary neural network for KWS. We first construct a High-frequency Enhancement Distillation scheme for the binarization-aware training, which emphasizes the high-frequency information from the full-precision network's representation that is more crucial for the optimization of the binarized network. Then, to allow the instant and adaptive accuracy-efficiency trade-offs at runtime, we also propose a Thinnable Binarization Architecture to further liberate the acceleration potential of the binarized network from the topology perspective. Moreover, we implement a Fast Bitwise Computation Kernel for BiFSMN on ARMv8 devices which fully utilizes registers and increases instruction throughput to push the limit of deployment efficiency. Extensive experiments show that BiFSMN outperforms existing binarization methods by convincing margins on various datasets and is even comparable with the full-precision counterpart (e.g., less than 3% drop on Speech Commands V1-12). We highlight that benefiting from the thinnable architecture and the optimized 1-bit implementation, BiFSMN can achieve an impressive 22.3x speedup and 15.5x storage-saving on real-world edge hardware. Our code is released at https://github.com/htqin/BiFSMN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题