论文标题

在资源约束的IoT硬件上,神经网络的多组分优化和有效部署

Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware

论文作者

Sudharsan, Bharath, Sundaram, Dineshkumar, Patel, Pankesh, Breslin, John G., Ali, Muhammad Intizar, Dustdar, Schahram, Zomaya, Albert, Ranjan, Rajiv

论文摘要

大多数IoT设备(例如智能手表,智能插头,HVAC控制器等)由具有约束规范(低内存,时钟速度和处理器)的硬件提供动力,该硬件不足以适应和执行大型高质量的模型。在此类资源受限的设备上,制造商仍然设法通过遵循编程IoT设备/产品的传统方法来收集和传输数据(图像,音频,传感器读数等),以提供有吸引力的功能(以促进销售)。几十年来,这种在线方法一直面临诸如数据流的损害,由于延迟,带宽限制,昂贵的订阅,用户的最新隐私问题以及GDPR指南等的最新隐私问题等问题等问题。开源实施。研究人员和开发人员可以使用我们的优化顺序来优化高内存,计算多个方面的模型,以产生小尺寸,低延迟,低功耗模型,这些模型可以舒适地适合和执行资源受限的硬件。实验结果表明,我们的优化组件可以产生这些模型; (i)12.06 x乘以压缩; (ii)精确度的0.13%至0.27%; (iii)在0.06 ms时的数量级提示更快。我们的优化顺序是通用的,可以应用于培训用于异常检测,预测性维护,机器人技术,语音识别和机器视觉的任何最新模型。

The majority of IoT devices like smartwatches, smart plugs, HVAC controllers, etc., are powered by hardware with a constrained specification (low memory, clock speed and processor) which is insufficient to accommodate and execute large, high-quality models. On such resource-constrained devices, manufacturers still manage to provide attractive functionalities (to boost sales) by following the traditional approach of programming IoT devices/products to collect and transmit data (image, audio, sensor readings, etc.) to their cloud-based ML analytics platforms. For decades, this online approach has been facing issues such as compromised data streams, non-real-time analytics due to latency, bandwidth constraints, costly subscriptions, recent privacy issues raised by users and the GDPR guidelines, etc. In this paper, to enable ultra-fast and accurate AI-based offline analytics on resource-constrained IoT devices, we present an end-to-end multi-component model optimization sequence and open-source its implementation. Researchers and developers can use our optimization sequence to optimize high memory, computation demanding models in multiple aspects in order to produce small size, low latency, low-power consuming models that can comfortably fit and execute on resource-constrained hardware. The experimental results show that our optimization components can produce models that are; (i) 12.06 x times compressed; (ii) 0.13% to 0.27% more accurate; (iii) Orders of magnitude faster unit inference at 0.06 ms. Our optimization sequence is generic and can be applied to any state-of-the-art models trained for anomaly detection, predictive maintenance, robotics, voice recognition, and machine vision.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源