按分类进行二进制：软功能真的必要吗？

论文标题

按分类进行二进制：软功能真的必要吗？

Binarizing by Classification: Is soft function really necessary?

论文作者

He, Yefei, Zhang, Luoming, Wu, Weijia, Zhou, Hong

论文摘要

二进制神经网络利用$ \ mathrm {sign} $函数来对重量和激活进行二进制，这需要梯度估计器克服其非差异性，并且不可避免地会在反向传播期间带来梯度错误。尽管已经提出了许多手工设计的软功能作为梯度估计器，以更好地近似梯度，但它们的机制尚不清楚，并且二进制模型及其完整精确的对应物之间仍然存在巨大的性能差距。为了解决这些问题并减少梯度错误，我们建议将网络二进制作为二进制分类问题解决，并使用多层感知器（MLP）作为向后通行证中的向前通行证和梯度估计器中的分类器。从MLP的理论能力中受益，可以适应任何连续功能，因此可以自适应地学习网络和反向传播梯度，而无需任何先前了解软功能。从这个角度来看，我们从经验上进一步证明，即使是简单的线性函数也可以超越先前的复杂软函数。广泛的实验表明，所提出的方法在图像分类和人体姿势估计任务中产生令人惊讶的表现。具体而言，我们在Imagenet数据集上实现了Resnet-34的$ 65.7 \％$ top-1的准确性，绝对提高了$ 2.6 \％$。此外，我们将二进制化作为姿势估计模型的轻巧方法，并提出了精心设计的二进制姿势估计网络SBPN和BHRNET。在评估具有挑战性的Microsoft可可关键数据集时，提出的方法使二进制网络能够首次获得高达$ 60.6 $的地图。在真实平台上进行的实验表明，BNN在性能和计算复杂性之间取得了更好的平衡，尤其是当计算资源极低时。

Binary neural networks leverage $\mathrm{Sign}$ function to binarize weights and activations, which require gradient estimators to overcome its non-differentiability and will inevitably bring gradient errors during backpropagation. Although many hand-designed soft functions have been proposed as gradient estimators to better approximate gradients, their mechanism is not clear and there are still huge performance gaps between binary models and their full-precision counterparts. To address these issues and reduce gradient error, we propose to tackle network binarization as a binary classification problem and use a multi-layer perceptron (MLP) as the classifier in the forward pass and gradient estimator in the backward pass. Benefiting from the MLP's theoretical capability to fit any continuous function, it can be adaptively learned to binarize networks and backpropagate gradients without any prior knowledge of soft functions. From this perspective, we further empirically justify that even a simple linear function can outperform previous complex soft functions. Extensive experiments demonstrate that the proposed method yields surprising performance both in image classification and human pose estimation tasks. Specifically, we achieve $65.7\%$ top-1 accuracy of ResNet-34 on ImageNet dataset, with an absolute improvement of $2.6\%$. Moreover, we take binarization as a lightweighting approach for pose estimation models and propose well-designed binary pose estimation networks SBPN and BHRNet. When evaluating on the challenging Microsoft COCO keypoint dataset, the proposed method enables binary networks to achieve a mAP of up to $60.6$ for the first time. Experiments conducted on real platforms demonstrate that BNN achieves a better balance between performance and computational complexity, especially when computational resources are extremely low.

下载PDF全文

下载文献需遵守相关版权规定

论文标题