论文标题

Centaur:基于芯片的混合稀疏密度加速器,用于个性化建议

Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations

论文作者

Hwang, Ranggi, Kim, Taehun, Kwon, Youngeun, Rhu, Minsoo

论文摘要

个性化建议是骨干机学习(ML)算法,该算法为从云数据中心提供了几个重要的应用程序域(例如广告,电子商务等)的功能。稀疏的嵌入层是设计建议的关键基础,但在正确加速这种重要的ML算法时,几乎没有得到关注。本文首先根据个性化建议提供了详细的工作负载表征,并确定了两个重要的性能限制器:内存密集型嵌入层和计算密集型多层感知器(MLP)层。然后,我们提出了Centaur,这是一种基于chiplet的混合稀疏密度加速器,它涵盖了嵌入层的内存吞吐量挑战和MLP层的计算局限性。我们对Intel HarPV2(一种包装集成的CPU+FPGA设备)实施并演示了我们的建议,该设备显示出1.7-17.2倍的性能加速和1.7-19.5X的能源效率提高,而不是传统方法。

Personalized recommendations are the backbone machine learning (ML) algorithm that powers several important application domains (e.g., ads, e-commerce, etc) serviced from cloud datacenters. Sparse embedding layers are a crucial building block in designing recommendations yet little attention has been paid in properly accelerating this important ML algorithm. This paper first provides a detailed workload characterization on personalized recommendations and identifies two significant performance limiters: memory-intensive embedding layers and compute-intensive multi-layer perceptron (MLP) layers. We then present Centaur, a chiplet-based hybrid sparse-dense accelerator that addresses both the memory throughput challenges of embedding layers and the compute limitations of MLP layers. We implement and demonstrate our proposal on an Intel HARPv2, a package-integrated CPU+FPGA device, which shows a 1.7-17.2x performance speedup and 1.7-19.5x energy-efficiency improvement than conventional approaches.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源