基于迭代激活的结构化修剪

论文标题

基于迭代激活的结构化修剪

Iterative Activation-based Structured Pruning

论文作者

Zhao, Kaiqi, Jain, Animesh, Zhao, Ming

论文摘要

在边缘设备上部署复杂的深度学习模型是具有挑战性的，因为它们具有大量的计算和内存资源需求，而Edge设备的资源预算是有限的。为了解决这个问题，已经提出了广泛的修剪技术来压缩网络。基于彩票假设（LTH）的最新进展表明，迭代模型修剪往往会产生较小，更准确的模型。但是，LTH研究的重点是非结构化的修剪，这是硬件可爱的，难以在硬件平台上加速。在本文中，我们在结构化修剪的背景下研究了迭代修剪，因为结构修剪的模型在商品硬件上很好地绘制了。我们发现，直接在迭代中直接应用基于结构的重量的修剪技术，称为迭代L1基 - 基于基于L1-norm的修剪（ILP），不会产生准确的修剪模型。为了解决这个问题，我们提出了两种基于激活的修剪方法，基于迭代激活的修剪（IAP）和基于自适应迭代激活的修剪（AIAP）。我们观察到，仅1％的精度损失，IAP和AIAP在LENET-5上实现了7.75倍和15.88 $ X的压缩，Resnet-50上的1.25倍和1.71倍的压缩，而ILP分别实现了4.77 x和1.13x。

Deploying complex deep learning models on edge devices is challenging because they have substantial compute and memory resource requirements, whereas edge devices' resource budget is limited. To solve this problem, extensive pruning techniques have been proposed for compressing networks. Recent advances based on the Lottery Ticket Hypothesis (LTH) show that iterative model pruning tends to produce smaller and more accurate models. However, LTH research focuses on unstructured pruning, which is hardware-inefficient and difficult to accelerate on hardware platforms. In this paper, we investigate iterative pruning in the context of structured pruning because structurally pruned models map well on commodity hardware. We find that directly applying a structured weight-based pruning technique iteratively, called iterative L1-norm based pruning (ILP), does not produce accurate pruned models. To solve this problem, we propose two activation-based pruning methods, Iterative Activation-based Pruning (IAP) and Adaptive Iterative Activation-based Pruning (AIAP). We observe that, with only 1% accuracy loss, IAP and AIAP achieve 7.75X and 15.88$X compression on LeNet-5, and 1.25X and 1.71X compression on ResNet-50, whereas ILP achieves 4.77X and 1.13X, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题