激活密度在训练中驱动节能修剪

论文标题

激活密度在训练中驱动节能修剪

Activation Density driven Energy-Efficient Pruning in Training

论文作者

Foldy-Porto, Timothy, Venkatesha, Yeshwanth, Panda, Priyadarshini

论文摘要

具有合适的再培训的神经网络修剪可以产生比原始参数少的网络，其准确度可比程度。典型的修剪方法需要大型的，完全训练的网络，作为起点，它们可以从中执行耗时的迭代修剪和再训练程序，以恢复原始精度。我们提出了一种新颖的修剪方法，该方法在训练过程中实时修剪网络，从而减少了总体训练时间以实现有效的压缩网络。我们引入了基于激活密度的分析，以确定网络每一层的最佳相对尺寸或压缩。我们的方法是建筑不可知论，可以将其用于各种系统。对于CIFAR-10，CIFAR-100和TINYIMAGENET上的VGG-19和RESNET18，我们获得了非常稀疏的网络（最高$ 200 \ times $降低参数，$ 60 \ $ 60 \ times $降低了最佳案例的推理计算操作），与基线网络相当。通过在培训期间定期减少网络规模，我们达到的总培训时间比以前提议的修剪方法短。此外，通过我们提出的方法在不同时期的训练压缩网络可在接近ISO准确性的训练计算复杂性（$ 1.6 \ tims $降至$ 3.2 \ times $降低）的大幅度降低，与完全从划痕中训练的基线网络相比。

Neural network pruning with suitable retraining can yield networks with considerably fewer parameters than the original with comparable degrees of accuracy. Typical pruning methods require large, fully trained networks as a starting point from which they perform a time-intensive iterative pruning and retraining procedure to regain the original accuracy. We propose a novel pruning method that prunes a network real-time during training, reducing the overall training time to achieve an efficient compressed network. We introduce an activation density based analysis to identify the optimal relative sizing or compression for each layer of the network. Our method is architecture agnostic, allowing it to be employed on a wide variety of systems. For VGG-19 and ResNet18 on CIFAR-10, CIFAR-100, and TinyImageNet, we obtain exceedingly sparse networks (up to $200 \times$ reduction in parameters and over $60 \times$ reduction in inference compute operations in the best case) with accuracy comparable to the baseline network. By reducing the network size periodically during training, we achieve total training times that are shorter than those of previously proposed pruning methods. Furthermore, training compressed networks at different epochs with our proposed method yields considerable reduction in training compute complexity ($1.6\times$ to $3.2\times$ lower) at near iso-accuracy as compared to a baseline network trained entirely from scratch.

下载PDF全文

下载文献需遵守相关版权规定

论文标题