论文标题
通过噪声回火量化错误的量化
Error-aware Quantization through Noise Tempering
论文作者
论文摘要
量化已成为模型压缩的主要方法,从而使在GPU上训练的大型模型可以在较小的形式因子设备上进行推理。量化感知训练(QAT)在模拟量化误差的同时优化了模型参数,从而比训练后量化更高的性能。通过非差异量化算子的梯度近似通常使用直接估计器(Ste)或加性噪声实现。但是,基于Ste的方法由于偏置梯度而遭受不稳定性的影响,而现有的基于噪声的方法无法降低所得的差异。在这项工作中,我们将指数衰减的量化量噪声和可学习的任务损失梯度的规模纳入了指数衰减,以近似量化运算符的效果。我们表明,这种方法以更好的优化方式结合了梯度量表和量化噪声,从而为每个重量和激活层的量化量量的梯度估算更细粒度估计。我们的受控噪声还包含一个隐式曲率术语,可以鼓励较小的minima,我们在实验中确实表现出这种情况。在CIFAR-10,CIFAR-100和IMAGENET基准上进行的实验训练重新结构体系结构表明,我们的方法获得了最先进的TOP-1分类精度,用于均匀(非混合过度)量化,以0.5-1.2%的绝对性能超过0.5-1.2%。
Quantization has become a predominant approach for model compression, enabling deployment of large models trained on GPUs onto smaller form-factor devices for inference. Quantization-aware training (QAT) optimizes model parameters with respect to the end task while simulating quantization error, leading to better performance than post-training quantization. Approximation of gradients through the non-differentiable quantization operator is typically achieved using the straight-through estimator (STE) or additive noise. However, STE-based methods suffer from instability due to biased gradients, whereas existing noise-based methods cannot reduce the resulting variance. In this work, we incorporate exponentially decaying quantization-error-aware noise together with a learnable scale of task loss gradient to approximate the effect of a quantization operator. We show this method combines gradient scale and quantization noise in a better optimized way, providing finer-grained estimation of gradients at each weight and activation layer's quantizer bin size. Our controlled noise also contains an implicit curvature term that could encourage flatter minima, which we show is indeed the case in our experiments. Experiments training ResNet architectures on the CIFAR-10, CIFAR-100 and ImageNet benchmarks show that our method obtains state-of-the-art top-1 classification accuracy for uniform (non mixed-precision) quantization, out-performing previous methods by 0.5-1.2% absolute.