论文标题
通过易于友好的内存布局,增加FPGA加速器内存带宽
Increasing FPGA Accelerators Memory Bandwidth with a Burst-Friendly Memory Layout
论文作者
论文摘要
将计算密集型内核卸载到硬件加速器上,取决于这些平台提供的大量并行性。但是,内存界面的有效带宽通常会导致瓶颈,从而阻碍了加速器的有效性能。启用数据重复使用的技术,例如平铺,降低内存流量的压力,但仍然经常离开加速器I/O结合。如果数据在内存中是连续的,则可以通过使用突发而不是元素访问来进一步增加有效带宽。 在本文中,我们提出了一种内存分配技术,并提供了概念验证源到源编译器通行证,该通行证可以通过修改外部内存中的数据布局来实现此类爆发传输。我们评估了该技术如何推动内存吞吐量,留出了利用其他并行性的空间,以最小的逻辑开销。
Offloading compute-intensive kernels to hardware accelerators relies on the large degree of parallelism offered by these platforms. However, the effective bandwidth of the memory interface often causes a bottleneck, hindering the accelerator's effective performance. Techniques enabling data reuse, such as tiling, lower the pressure on memory traffic but still often leave the accelerators I/O-bound. A further increase in effective bandwidth is possible by using burst rather than element-wise accesses, provided the data is contiguous in memory. In this paper, we propose a memory allocation technique, and provide a proof-of-concept source-to-source compiler pass, that enables such burst transfers by modifying the data layout in external memory. We assess how this technique pushes up the memory throughput, leaving room for exploiting additional parallelism, for a minimal logic overhead.