嘈杂的机器：了解嘈杂的神经网络并使用蒸馏增强对模拟硬件错误的鲁棒性

论文标题

嘈杂的机器：了解嘈杂的神经网络并使用蒸馏增强对模拟硬件错误的鲁棒性

Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation

论文作者

Zhou, Chuteng, Kadambi, Prad, Mattina, Matthew, Whatmough, Paul N.

论文摘要

深度学习的成功引起了人们对计算机硬件设计的兴趣，以更好地满足神经网络推断的高度需求。特别是，模拟计算硬件是基于电子，光子或光子设备的神经网络而专门用于加速神经网络的激励，这很可能比传统的数字电子产品实现较低的功耗。但是，这些提出的模拟加速器会遭受其物理成分产生的固有噪声，这使得在深神经网络上实现高精度的挑战。因此，要成功地在模拟加速器上部署，必须训练深层神经网络，以使网络权重中的随机连续噪声保持稳健，这在机器学习中是一个新的挑战。在本文中，我们提高了对嘈杂神经网络的理解。我们概述了嘈杂的神经网络如何由于其输入和输出之间的相互信息丢失而降低了学习能力。为了解决这个问题，我们建议在训练过程中使用知识蒸馏与噪声注入相结合，以实现更多的噪声稳健网络，这在包括Imagenet在内的不同网络和数据集中在实验中得到了证明。与以前的最佳尝试相比，我们的方法具有多达两倍的噪声耐受性，这是使模拟硬件实用的重要一步。

The success of deep learning has brought forth a wave of interest in computer hardware design to better meet the high demands of neural network inference. In particular, analog computing hardware has been heavily motivated specifically for accelerating neural networks, based on either electronic, optical or photonic devices, which may well achieve lower power consumption than conventional digital electronics. However, these proposed analog accelerators suffer from the intrinsic noise generated by their physical components, which makes it challenging to achieve high accuracy on deep neural networks. Hence, for successful deployment on analog accelerators, it is essential to be able to train deep neural networks to be robust to random continuous noise in the network weights, which is a somewhat new challenge in machine learning. In this paper, we advance the understanding of noisy neural networks. We outline how a noisy neural network has reduced learning capacity as a result of loss of mutual information between its input and output. To combat this, we propose using knowledge distillation combined with noise injection during training to achieve more noise robust networks, which is demonstrated experimentally across different networks and datasets, including ImageNet. Our method achieves models with as much as two times greater noise tolerance compared with the previous best attempts, which is a significant step towards making analog hardware practical for deep learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题