通过嵌套的稀疏性在微小设备上的动态交流

论文标题

通过嵌套的稀疏性在微小设备上的动态交流

Dynamic ConvNets on Tiny Devices via Nested Sparsity

论文作者

Grimaldi, Matteo, Mocerino, Luca, Cipolletta, Antonio, Calimera, Andrea

论文摘要

这项工作介绍了一条新的培训和压缩管道，以构建嵌套的稀疏交流，这是一类动态卷积神经网络（CORVNETS），该卷积神经网络（Convnets）适用于在资源受限的设备上部署在The-Things Internet的边缘上的推理任务。一个嵌套的稀疏Convnet由一个单个Convnet架构组成，该结构包含N具有嵌套重量子集的N稀疏子网络，例如Matryoshka Doll，并且可以在运行时以延迟的延迟来交换型号的延迟，并以模型稀疏为动态旋钮。为了在训练时获得高度的精度，我们提出了一种梯度掩盖技术，该技术可以最佳地路由嵌套权重子集的学习信号。为了最大程度地减少存储足迹并有效地处理推理时获得的模型，我们引入了一种新的稀疏矩阵压缩格式，并使用专用的计算核，可效力利用嵌套重量子集的特征。在现成的ARM-M7微控制器单元（MCU）上对图像分类和对象检测任务进行了测试，嵌套的稀疏Convnets的表现优于可变延迟解决方案的可变延迟解决方案天真地组装，组装了单个稀疏模型，该模型训练为独立的实例，成就（I）可比精度，（II）出色的存储储蓄，以及（ii）出色的存储储蓄和（iii）高性能。此外，与最先进的动态策略（如动态修剪和层宽度缩放）相比，嵌套的稀疏弯曲反向网络在准确性与潜伏期空间中是最佳的帕累托。

This work introduces a new training and compression pipeline to build Nested Sparse ConvNets, a class of dynamic Convolutional Neural Networks (ConvNets) suited for inference tasks deployed on resource-constrained devices at the edge of the Internet-of-Things. A Nested Sparse ConvNet consists of a single ConvNet architecture containing N sparse sub-networks with nested weights subsets, like a Matryoshka doll, and can trade accuracy for latency at run time, using the model sparsity as a dynamic knob. To attain high accuracy at training time, we propose a gradient masking technique that optimally routes the learning signals across the nested weights subsets. To minimize the storage footprint and efficiently process the obtained models at inference time, we introduce a new sparse matrix compression format with dedicated compute kernels that fruitfully exploit the characteristic of the nested weights subsets. Tested on image classification and object detection tasks on an off-the-shelf ARM-M7 Micro Controller Unit (MCU), Nested Sparse ConvNets outperform variable-latency solutions naively built assembling single sparse models trained as stand-alone instances, achieving (i) comparable accuracy, (ii) remarkable storage savings, and (iii) high performance. Moreover, when compared to state-of-the-art dynamic strategies, like dynamic pruning and layer width scaling, Nested Sparse ConvNets turn out to be Pareto optimal in the accuracy vs. latency space.

下载PDF全文

下载文献需遵守相关版权规定

论文标题