Centaur：基于芯片的混合稀疏密度加速器，用于个性化建议

论文标题

Centaur：基于芯片的混合稀疏密度加速器，用于个性化建议

Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations

论文作者

Hwang, Ranggi, Kim, Taehun, Kwon, Youngeun, Rhu, Minsoo

论文摘要

个性化建议是骨干机学习（ML）算法，该算法为从云数据中心提供了几个重要的应用程序域（例如广告，电子商务等）的功能。稀疏的嵌入层是设计建议的关键基础，但在正确加速这种重要的ML算法时，几乎没有得到关注。本文首先根据个性化建议提供了详细的工作负载表征，并确定了两个重要的性能限制器：内存密集型嵌入层和计算密集型多层感知器（MLP）层。然后，我们提出了Centaur，这是一种基于chiplet的混合稀疏密度加速器，它涵盖了嵌入层的内存吞吐量挑战和MLP层的计算局限性。我们对Intel HarPV2（一种包装集成的CPU+FPGA设备）实施并演示了我们的建议，该设备显示出1.7-17.2倍的性能加速和1.7-19.5X的能源效率提高，而不是传统方法。

Personalized recommendations are the backbone machine learning (ML) algorithm that powers several important application domains (e.g., ads, e-commerce, etc) serviced from cloud datacenters. Sparse embedding layers are a crucial building block in designing recommendations yet little attention has been paid in properly accelerating this important ML algorithm. This paper first provides a detailed workload characterization on personalized recommendations and identifies two significant performance limiters: memory-intensive embedding layers and compute-intensive multi-layer perceptron (MLP) layers. We then present Centaur, a chiplet-based hybrid sparse-dense accelerator that addresses both the memory throughput challenges of embedding layers and the compute limitations of MLP layers. We implement and demonstrate our proposal on an Intel HARPv2, a package-integrated CPU+FPGA device, which shows a 1.7-17.2x performance speedup and 1.7-19.5x energy-efficiency improvement than conventional approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题