论文标题

通过嵌套的稀疏性在微小设备上的动态交流

Dynamic ConvNets on Tiny Devices via Nested Sparsity

论文作者

Grimaldi, Matteo, Mocerino, Luca, Cipolletta, Antonio, Calimera, Andrea

论文摘要

这项工作介绍了一条新的培训和压缩管道,以构建嵌套的稀疏交流,这是一类动态卷积神经网络(CORVNETS),该卷积神经网络(Convnets)适用于在资源受限的设备上部署在The-Things Internet的边缘上的推理任务。一个嵌套的稀疏Convnet由一个单个Convnet架构组成,该结构包含N具有嵌套重量子集的N稀疏子网络,例如Matryoshka Doll,并且可以在运行时以延迟的延迟来交换型号的延迟,并以模型稀疏为动态旋钮。为了在训练时获得高度的精度,我们提出了一种梯度掩盖技术,该技术可以最佳地路由嵌套权重子集的学习信号。为了最大程度地减少存储足迹并有效地处理推理时获得的模型,我们引入了一种新的稀疏矩阵压缩格式,并使用专用的计算核,可效力利用嵌套重量子集的特征。在现成的ARM-M7微控制器单元(MCU)上对图像分类和对象检测任务进行了测试,嵌套的稀疏Convnets的表现优于可变延迟解决方案的可变延迟解决方案天真地组装,组装了单个稀疏模型,该模型训练为独立的实例,成就(I)可比精度,(II)出色的存储储蓄,以及(ii)出色的存储储蓄和(iii)高性能。此外,与最先进的动态策略(如动态修剪和层宽度缩放)相比,嵌套的稀疏弯曲反向网络在准确性与潜伏期空间中是最佳的帕累托。

This work introduces a new training and compression pipeline to build Nested Sparse ConvNets, a class of dynamic Convolutional Neural Networks (ConvNets) suited for inference tasks deployed on resource-constrained devices at the edge of the Internet-of-Things. A Nested Sparse ConvNet consists of a single ConvNet architecture containing N sparse sub-networks with nested weights subsets, like a Matryoshka doll, and can trade accuracy for latency at run time, using the model sparsity as a dynamic knob. To attain high accuracy at training time, we propose a gradient masking technique that optimally routes the learning signals across the nested weights subsets. To minimize the storage footprint and efficiently process the obtained models at inference time, we introduce a new sparse matrix compression format with dedicated compute kernels that fruitfully exploit the characteristic of the nested weights subsets. Tested on image classification and object detection tasks on an off-the-shelf ARM-M7 Micro Controller Unit (MCU), Nested Sparse ConvNets outperform variable-latency solutions naively built assembling single sparse models trained as stand-alone instances, achieving (i) comparable accuracy, (ii) remarkable storage savings, and (iii) high performance. Moreover, when compared to state-of-the-art dynamic strategies, like dynamic pruning and layer width scaling, Nested Sparse ConvNets turn out to be Pareto optimal in the accuracy vs. latency space.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源