论文标题

快速:高通量和能源效率的近似管道的软乘数和分隔线

RAPID: AppRoximAte Pipelined Soft Multipliers and Dividers for High-Throughput and Energy-Efficiency

论文作者

Ebrahimi, Zahra, Zaid, Muhammad, Wijtvliet, Mark, Kumar, Akash

论文摘要

错误弹性应用程序中的快速更新以及对高通量的追求,可以激励设计快速的近似功能单元,以实地可​​编程栅极阵列(FPGAS)。提出的不精确功能技术的研究具有三个缺点:首先,大多数不精确的乘数和分隔线专门用于特定于应用的集成电路(ASIC)平台。其次,最新的(SOA)近似单元被替换,主要是在多内核应用程序的单个内核中。此外,对结果质量(QOR)采用了端到端的评估,但不采用总体绩效。最后,现有的不精确组件并非旨在支持管道的方法,这可以提高包括分区的应用程序的操作频率/吞吐量。在本文中,我们提出了针对FPGA的第一个管道的近似乘数和分隔架构的快速构造。拟议的单元有效地利用了6输入查找表(6-LUTS)和快速携带链来实现Mitchell的近似算法。我们的新颖错误进行误差方案不仅在基线Mitchell的方法上可以忽略不计,而且还将其准确性提高到99.4%,即任意乘法和分裂大小。实验结果表明,与准确的对应物相比,提议的管道和非涉及快速乘数和分隔线的效率。此外,对快速的端到端评估在三个多内核应用程序中部署在生物信号处理,图像处理和无人驾驶飞机(UAV)的移动对象跟踪的领域中,表明面积,延迟,延迟,区域 - 居住区和领域 - 否决产品(ADP)的区域,延迟,延迟,延迟(ADP)的进步,并与准确的损失相处。

The rapid updates in error-resilient applications along with their quest for high throughput have motivated designing fast approximate functional units for Field-Programmable Gate Arrays (FPGAs). Studies that proposed imprecise functional techniques are posed with three shortcomings: first, most inexact multipliers and dividers are specialized for Application-Specific Integrated Circuit (ASIC) platforms. Second, state-of-the-art (SoA) approximate units are substituted, mostly in a single kernel of a multi-kernel application. Moreover, the end-to-end assessment is adopted on the Quality of Results (QoR), but not on the overall gained performance. Finally, existing imprecise components are not designed to support a pipelined approach, which could boost the operating frequency/throughput of, e.g., division-included applications. In this paper, we propose RAPID, the first pipelined approximate multiplier and divider architecture, customized for FPGAs. The proposed units efficiently utilize 6-input Look-up Tables (6-LUTs) and fast carry chains to implement Mitchell's approximate algorithms. Our novel error-refinement scheme not only has negligible overhead over the baseline Mitchell's approach but also boosts its accuracy to 99.4% for arbitrary size of multiplication and division. Experimental results demonstrate the efficiency of the proposed pipelined and non-pipelined RAPID multipliers and dividers over accurate counterparts. Moreover, the end-to-end evaluations of RAPID, deployed in three multi-kernel applications in the domains of bio-signal processing, image processing, and moving object tracking for Unmanned Air Vehicles (UAV) indicate up to 45% improvements in area, latency, and Area-Delay-Product (ADP), respectively, over accurate kernels, with negligible loss in QoR.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源