维度通过修剪和冻结深神经网络的培训减少了训练，调查

论文标题

维度通过修剪和冻结深神经网络的培训减少了训练，调查

Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey

论文作者

Wimmer, Paul, Mehnert, Jens, Condurache, Alexandru Paul

论文摘要

最先进的深度学习模型的参数计数达到了数十亿。培训，存储和转移此类模型是精力和耗时的，因此昂贵。这些费用的很大一部分是培训网络引起的。模型压缩可以降低存储和转移成本，并可以通过减少向前和/或向后通过中的计算数量来进一步提高培训。因此，在培训时间也要在保持高性能的同时压缩网络是一个重要的研究主题。这项工作是对整个培训中深度学习模型中受过训练的权重的数量的调查。大多数引入的方法将网络参数设置为零，称为修剪。在初始化，彩票和动态稀疏训练时，提出的修剪方法被分类为修剪。此外，我们讨论了在其随机初始化时冻结网络部分的方法。通过冻结权重，可训练的参数的数量缩小，从而降低了梯度计算和模型优化空间的维度。在这项调查中，我们首先提出维度减少了培训，作为一种基本数学模型，涵盖训练过程中修剪和冻结。之后，我们介绍并讨论不同的维度减少训练方法。

State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly. A big part of these costs is caused by training the network. Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass. Thus, compressing networks also at training time while maintaining a high performance is an important research topic. This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training. Most of the introduced methods set network parameters to zero which is called pruning. The presented pruning approaches are categorized into pruning at initialization, lottery tickets and dynamic sparse training. Moreover, we discuss methods that freeze parts of a network at its random initialization. By freezing weights, the number of trainable parameters is shrunken which reduces gradient computations and the dimensionality of the model's optimization space. In this survey we first propose dimensionality reduced training as an underlying mathematical model that covers pruning and freezing during training. Afterwards, we present and discuss different dimensionality reduced training methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题