论文标题
P2M:用于资源受限的Tinyml应用程序中的像素中的像素范式
P2M: A Processing-in-Pixel-in-Memory Paradigm for Resource-Constrained TinyML Applications
论文作者
论文摘要
处理从最先进的高分辨率摄像机生成的大量数据的需求激发了新型的节能AI SOI解决方案。传感器像素阵列通常以模拟电压的形式捕获此类相机中的视觉数据,然后将使用模数转换器(ADC)转换为数字域以进行随后的AI处理。最近的研究试图以近传感器和传感器内处理的形式利用大量平行的低功率模拟/数字计算,其中AI计算部分在像素阵列的外围进行,部分是在单独的板载CPU/加速器中进行的。不幸的是,仍然需要在相机和AI处理单元之间流式传输高分辨率输入图像,逐帧,从而导致能量,带宽和安全瓶颈。为了减轻此问题,我们提出了一种新颖的像素中的像素内存(P2M)范式,该范式通过添加对模拟多通道,多位数卷积,批处理归一化和RECERU(Rectified Lineartivied Linerear单位)的支持来自定义像素阵列。我们的解决方案包括整体算法循环共设计方法,所得的P2M范式可以用作置换式替换,以嵌入铸造式卷积神经网络(CNN)模型中的记忆密集型前几层。我们的实验结果表明,P2M将从传感器和模拟转换到数字转换的数据传输带宽减少了〜21x,而在视觉唤醒单词数据集(VWW)的Tinyml用例上处理MobilenEtv2模型时,能量 - 延迟产品(EDP)与与标准的近距离测试相比,在视觉唤醒单词数据集(VWW)的TINYML用例(VWW)上都没有任何显着限制,而无需进行任何计算。
The demand to process vast amounts of data generated from state-of-the-art high resolution cameras has motivated novel energy-efficient on-device AI solutions. Visual data in such cameras are usually captured in the form of analog voltages by a sensor pixel array, and then converted to the digital domain for subsequent AI processing using analog-to-digital converters (ADC). Recent research has tried to take advantage of massively parallel low-power analog/digital computing in the form of near- and in-sensor processing, in which the AI computation is performed partly in the periphery of the pixel array and partly in a separate on-board CPU/accelerator. Unfortunately, high-resolution input images still need to be streamed between the camera and the AI processing unit, frame by frame, causing energy, bandwidth, and security bottlenecks. To mitigate this problem, we propose a novel Processing-in-Pixel-in-memory (P2M) paradigm, that customizes the pixel array by adding support for analog multi-channel, multi-bit convolution, batch normalization, and ReLU (Rectified Linear Units). Our solution includes a holistic algorithm-circuit co-design approach and the resulting P2M paradigm can be used as a drop-in replacement for embedding memory-intensive first few layers of convolutional neural network (CNN) models within foundry-manufacturable CMOS image sensor platforms. Our experimental results indicate that P2M reduces data transfer bandwidth from sensors and analog to digital conversions by ~21x, and the energy-delay product (EDP) incurred in processing a MobileNetV2 model on a TinyML use case for visual wake words dataset (VWW) by up to ~11x compared to standard near-processing or in-sensor implementations, without any significant drop in test accuracy.