在资源约束的IoT硬件上，神经网络的多组分优化和有效部署

论文标题

在资源约束的IoT硬件上，神经网络的多组分优化和有效部署

Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware

论文作者

Sudharsan, Bharath, Sundaram, Dineshkumar, Patel, Pankesh, Breslin, John G., Ali, Muhammad Intizar, Dustdar, Schahram, Zomaya, Albert, Ranjan, Rajiv

论文摘要

大多数IoT设备（例如智能手表，智能插头，HVAC控制器等）由具有约束规范（低内存，时钟速度和处理器）的硬件提供动力，该硬件不足以适应和执行大型高质量的模型。在此类资源受限的设备上，制造商仍然设法通过遵循编程IoT设备/产品的传统方法来收集和传输数据（图像，音频，传感器读数等），以提供有吸引力的功能（以促进销售）。几十年来，这种在线方法一直面临诸如数据流的损害，由于延迟，带宽限制，昂贵的订阅，用户的最新隐私问题以及GDPR指南等的最新隐私问题等问题等问题。开源实施。研究人员和开发人员可以使用我们的优化顺序来优化高内存，计算多个方面的模型，以产生小尺寸，低延迟，低功耗模型，这些模型可以舒适地适合和执行资源受限的硬件。实验结果表明，我们的优化组件可以产生这些模型；（i）12.06 x乘以压缩；（ii）精确度的0.13％至0.27％；（iii）在0.06 ms时的数量级提示更快。我们的优化顺序是通用的，可以应用于培训用于异常检测，预测性维护，机器人技术，语音识别和机器视觉的任何最新模型。

The majority of IoT devices like smartwatches, smart plugs, HVAC controllers, etc., are powered by hardware with a constrained specification (low memory, clock speed and processor) which is insufficient to accommodate and execute large, high-quality models. On such resource-constrained devices, manufacturers still manage to provide attractive functionalities (to boost sales) by following the traditional approach of programming IoT devices/products to collect and transmit data (image, audio, sensor readings, etc.) to their cloud-based ML analytics platforms. For decades, this online approach has been facing issues such as compromised data streams, non-real-time analytics due to latency, bandwidth constraints, costly subscriptions, recent privacy issues raised by users and the GDPR guidelines, etc. In this paper, to enable ultra-fast and accurate AI-based offline analytics on resource-constrained IoT devices, we present an end-to-end multi-component model optimization sequence and open-source its implementation. Researchers and developers can use our optimization sequence to optimize high memory, computation demanding models in multiple aspects in order to produce small size, low latency, low-power consuming models that can comfortably fit and execute on resource-constrained hardware. The experimental results show that our optimization components can produce models that are; (i) 12.06 x times compressed; (ii) 0.13% to 0.27% more accurate; (iii) Orders of magnitude faster unit inference at 0.06 ms. Our optimization sequence is generic and can be applied to any state-of-the-art models trained for anomaly detection, predictive maintenance, robotics, voice recognition, and machine vision.

下载PDF全文

下载文献需遵守相关版权规定

论文标题