论文标题
低位宽度量化神经网络的渐近估计梯度
Propagating Asymptotic-Estimated Gradients for Low Bitwidth Quantized Neural Networks
论文作者
论文摘要
量化的神经网络(QNN)对于神经网络的加速和压缩可能很有用,但是在训练过程中,它们构成了一个挑战:如何通过图形流量传播损失函数的梯度,几乎无处不在0。为了应对这种非差异情况,我们提出了一种新型的渐近量化估计量(AQE)来估计梯度。特别是,在后传出期间,将输入与输出联系起来的图保持平稳性和不同性。在训练结束时,由于AQE的渐近行为,权重和激活已被量化为低精油。同时,我们提出了一个由AQE训练的M-BIT输入和N位权重网络(MINW-NET),该网络是一个具有1-3位重量和激活的量化神经网络。在推理阶段,我们可以使用XNOR或移动操作而不是卷积操作来加速Minw-NET。我们在CIFAR数据集上进行的实验表明,我们的AQE定义很好,并且具有AQE的QNN的性能要比直通估算器(Ste)更好。例如,在具有1位权重和激活的相同的Convnet的情况下,使用AQE的MINW-NET可以实现预测准确性1.5 \%比使用Ste的二进制神经网络(BNN)高。 Minw-NET是由AQE从头开始训练的,可以将可比较的分类精度与CIFAR测试集上的32位对应率达到同类。 ImageNet数据集的广泛实验结果表明,所提出的AQE的优势和我们的Minw-NET与其他最先进的QNN相当。
The quantized neural networks (QNNs) can be useful for neural network acceleration and compression, but during the training process they pose a challenge: how to propagate the gradient of loss function through the graph flow with a derivative of 0 almost everywhere. In response to this non-differentiable situation, we propose a novel Asymptotic-Quantized Estimator (AQE) to estimate the gradient. In particular, during back-propagation, the graph that relates inputs to output remains smoothness and differentiability. At the end of training, the weights and activations have been quantized to low-precision because of the asymptotic behaviour of AQE. Meanwhile, we propose a M-bit Inputs and N-bit Weights Network (MINW-Net) trained by AQE, a quantized neural network with 1-3 bits weights and activations. In the inference phase, we can use XNOR or SHIFT operations instead of convolution operations to accelerate the MINW-Net. Our experiments on CIFAR datasets demonstrate that our AQE is well defined, and the QNNs with AQE perform better than that with Straight-Through Estimator (STE). For example, in the case of the same ConvNet that has 1-bit weights and activations, our MINW-Net with AQE can achieve a prediction accuracy 1.5\% higher than the Binarized Neural Network (BNN) with STE. The MINW-Net, which is trained from scratch by AQE, can achieve comparable classification accuracy as 32-bit counterparts on CIFAR test sets. Extensive experimental results on ImageNet dataset show great superiority of the proposed AQE and our MINW-Net achieves comparable results with other state-of-the-art QNNs.