低位宽度量化神经网络的渐近估计梯度

论文标题

低位宽度量化神经网络的渐近估计梯度

Propagating Asymptotic-Estimated Gradients for Low Bitwidth Quantized Neural Networks

论文作者

Chen, Jun, Liu, Yong, Zhang, Hao, Hou, Shengnan, Yang, Jian

论文摘要

量化的神经网络（QNN）对于神经网络的加速和压缩可能很有用，但是在训练过程中，它们构成了一个挑战：如何通过图形流量传播损失函数的梯度，几乎无处不在0。为了应对这种非差异情况，我们提出了一种新型的渐近量化估计量（AQE）来估计梯度。特别是，在后传出期间，将输入与输出联系起来的图保持平稳性和不同性。在训练结束时，由于AQE的渐近行为，权重和激活已被量化为低精油。同时，我们提出了一个由AQE训练的M-BIT输入和N位权重网络（MINW-NET），该网络是一个具有1-3位重量和激活的量化神经网络。在推理阶段，我们可以使用XNOR或移动操作而不是卷积操作来加速Minw-NET。我们在CIFAR数据集上进行的实验表明，我们的AQE定义很好，并且具有AQE的QNN的性能要比直通估算器（Ste）更好。例如，在具有1位权重和激活的相同的Convnet的情况下，使用AQE的MINW-NET可以实现预测准确性1.5 \％比使用Ste的二进制神经网络（BNN）高。 Minw-NET是由AQE从头开始训练的，可以将可比较的分类精度与CIFAR测试集上的32位对应率达到同类。 ImageNet数据集的广泛实验结果表明，所提出的AQE的优势和我们的Minw-NET与其他最先进的QNN相当。

The quantized neural networks (QNNs) can be useful for neural network acceleration and compression, but during the training process they pose a challenge: how to propagate the gradient of loss function through the graph flow with a derivative of 0 almost everywhere. In response to this non-differentiable situation, we propose a novel Asymptotic-Quantized Estimator (AQE) to estimate the gradient. In particular, during back-propagation, the graph that relates inputs to output remains smoothness and differentiability. At the end of training, the weights and activations have been quantized to low-precision because of the asymptotic behaviour of AQE. Meanwhile, we propose a M-bit Inputs and N-bit Weights Network (MINW-Net) trained by AQE, a quantized neural network with 1-3 bits weights and activations. In the inference phase, we can use XNOR or SHIFT operations instead of convolution operations to accelerate the MINW-Net. Our experiments on CIFAR datasets demonstrate that our AQE is well defined, and the QNNs with AQE perform better than that with Straight-Through Estimator (STE). For example, in the case of the same ConvNet that has 1-bit weights and activations, our MINW-Net with AQE can achieve a prediction accuracy 1.5\% higher than the Binarized Neural Network (BNN) with STE. The MINW-Net, which is trained from scratch by AQE, can achieve comparable classification accuracy as 32-bit counterparts on CIFAR test sets. Extensive experimental results on ImageNet dataset show great superiority of the proposed AQE and our MINW-Net achieves comparable results with other state-of-the-art QNNs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题