基于NVM的神经形态处理元件的设计技术合作式化

论文标题

基于NVM的神经形态处理元件的设计技术合作式化

Design-Technology Co-Optimization for NVM-based Neuromorphic Processing Elements

论文作者

Song, Shihao, Balaji, Adarsha, Das, Anup, Kandasamy, Nagarajan

论文摘要

神经形态硬件平台可以大大降低机器学习推理任务的能量开销。我们提出了设计技术权衡分析，以实现基于非挥发性内存（NVM）的神经形态硬件的处理元素（PE）（PES）。通过详细的电路级模拟在缩放过程技术节点上，我们显示了技术缩放对信息处理延迟的负面影响，这影响了嵌入式ML系统的服务质量（QOS）。在较细的粒度下，PE内的延迟取决于1）寄生成分在其当前路径上引入的延迟，以及2）延迟的变化延迟，以感知其NVM细胞的不同电阻态。基于这两个观察结果，我们做出以下三项贡献。首先，在技术方面，我们提出了一个优化方案，其中的NVM电阻状态花费最长的时间来设置在具有最小延迟的当前路径上，反之亦然，从而降低了平均PE延迟，从而改善了QoS。其次，在体系结构方面，我们将每个PE中的隔离晶体管介绍为可以单独电源门控的区域，从而减少延迟和能量。最后，在系统软件方面，我们提出了一种机制，以在实施硬件神经形态PE的机器学习推理任务时利用所提出的技术和架构增强功能。通过最近的神经形态硬件结构进行的评估表明，我们提出的设计技术合作方法可以提高机器学习推理任务的性能和能源效率，而不会导致每位高成本高。

Neuromorphic hardware platforms can significantly lower the energy overhead of a machine learning inference task. We present a design-technology tradeoff analysis to implement such inference tasks on the processing elements (PEs) of a Non- Volatile Memory (NVM)-based neuromorphic hardware. Through detailed circuit-level simulations at scaled process technology nodes, we show the negative impact of technology scaling on the information-processing latency, which impacts the quality-of-service (QoS) of an embedded ML system. At a finer granularity, the latency inside a PE depends on 1) the delay introduced by parasitic components on its current paths, and 2) the varying delay to sense different resistance states of its NVM cells. Based on these two observations, we make the following three contributions. First, on the technology front, we propose an optimization scheme where the NVM resistance state that takes the longest time to sense is set on current paths having the least delay, and vice versa, reducing the average PE latency, which improves the QoS. Second, on the architecture front, we introduce isolation transistors within each PE to partition it into regions that can be individually power-gated, reducing both latency and energy. Finally, on the system-software front, we propose a mechanism to leverage the proposed technological and architectural enhancements when implementing a machine-learning inference task on neuromorphic PEs of the hardware. Evaluations with a recent neuromorphic hardware architecture show that our proposed design-technology co-optimization approach improves both performance and energy efficiency of machine-learning inference tasks without incurring high cost-per-bit.

下载PDF全文

下载文献需遵守相关版权规定

论文标题