通过非凸正则化改善网络减肥

论文标题

通过非凸正则化改善网络减肥

Improving Network Slimming with Nonconvex Regularization

论文作者

Bui, Kevin, Park, Fredrick, Zhang, Shuai, Qi, Yingyong, Xin, Jack

论文摘要

卷积神经网络（CNN）已发展为从对象检测到语义分割的各种计算机视觉任务的强大模型。但是，大多数最先进的CNN不能直接部署在智能手机和无人机等边缘设备上，这些智能手机和无人机需要在有限的功率和内存带宽下需要低延迟。压缩CNN的一种流行，直接的方法是网络减肥，它通过培训期间通过批处理层对渠道相关的缩放因子施加$ \ ell_1 $正则化。网络减肥，从而确定可以修剪的微不足道的通道。在本文中，我们建议用替代性的非凸件，诱发稀疏性罚款来代替$ \ ell_1 $罚款，以产生更加压缩和/或准确的CNN体系结构。我们研究了$ \ ell_p（0 <p <1）$，转换为$ \ ell_1 $（t $ \ ell_1 $），minimax凹额（MCP），并顺利地剪切了绝对偏差（SCAD），因为它们最近的成功和在解决稀疏优化问题（例如压缩感应和可变选择）方面的成功和受欢迎程度。我们在标准图像分类数据集对三个神经网络体系结构（VGG-19，Densenet-40和Resnet-164）上进行了非convex惩罚的网络减肥的有效性。基于数值实验，T $ \ ell_1 $保留模型的准确性针对通道修剪，$ \ ell_ {1/2，3/4} $在重新培训后，具有与$ \ ell_1 $相似的精确度的压缩模型，MCP和SCAD具有相似的MCP和SCAD，在具有类似的压缩模型后提供了更准确的压缩模型。用T $ \ ELL_1 $正则化的网络缩小还优于最新的贝叶斯修改网络修改，以在内存存储中压缩CNN体系结构，同时在修剪通道后保持其模型准确性。

Convolutional neural networks (CNNs) have developed to become powerful models for various computer vision tasks ranging from object detection to semantic segmentation. However, most of the state-of-the-art CNNs cannot be deployed directly on edge devices such as smartphones and drones, which need low latency under limited power and memory bandwidth. One popular, straightforward approach to compressing CNNs is network slimming, which imposes $\ell_1$ regularization on the channel-associated scaling factors via the batch normalization layers during training. Network slimming thereby identifies insignificant channels that can be pruned for inference. In this paper, we propose replacing the $\ell_1$ penalty with an alternative nonconvex, sparsity-inducing penalty in order to yield a more compressed and/or accurate CNN architecture. We investigate $\ell_p (0 < p < 1)$, transformed $\ell_1$ (T$\ell_1$), minimax concave penalty (MCP), and smoothly clipped absolute deviation (SCAD) due to their recent successes and popularity in solving sparse optimization problems, such as compressed sensing and variable selection. We demonstrate the effectiveness of network slimming with nonconvex penalties on three neural network architectures -- VGG-19, DenseNet-40, and ResNet-164 -- on standard image classification datasets. Based on the numerical experiments, T$\ell_1$ preserves model accuracy against channel pruning, $\ell_{1/2, 3/4}$ yield better compressed models with similar accuracies after retraining as $\ell_1$, and MCP and SCAD provide more accurate models after retraining with similar compression as $\ell_1$. Network slimming with T$\ell_1$ regularization also outperforms the latest Bayesian modification of network slimming in compressing a CNN architecture in terms of memory storage while preserving its model accuracy after channel pruning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题