论文标题
HAOCL:利用大规模的异质处理器变得容易
HaoCL: Harnessing Large-scale Heterogeneous Processors Made Easy
论文作者
论文摘要
深度学习(DL)和图形处理(GP)的普遍采用使得构建包括GPU和FPGA在内的异质加速器的大规模群集的事实是,它确实是必要的。 OpenCL编程框架可以用于此类簇的各个节点,但不打算以分布式方式部署。幸运的是,原始的OpenCL语义自然适合异质簇的编程环境。在本文中,我们提出了一个异质性感知的opencl样(HAOCL)编程框架,以促进在大规模异构群中进行广泛的科学应用程序编程。使用HAOCL,现有应用程序可以直接部署在异质群集上,而无需对原始OpenCL源代码进行任何修改,并且没有意识到基础硬件拓扑和配置。我们的实验表明,HaoCL在分布式环境中施加了可忽略的开销,并且当计算或数据大小超过单个节点的容量时,在标准基准测试中提供了近乎线性的加速。系统设计和评估在此演示论文中介绍。
The pervasive adoption of Deep Learning (DL) and Graph Processing (GP) makes it a de facto requirement to build large-scale clusters of heterogeneous accelerators including GPUs and FPGAs. The OpenCL programming framework can be used on the individual nodes of such clusters but is not intended for deployment in a distributed manner. Fortunately, the original OpenCL semantics naturally fit into the programming environment of heterogeneous clusters. In this paper, we propose a heterogeneity-aware OpenCL-like (HaoCL) programming framework to facilitate the programming of a wide range of scientific applications including DL and GP workloads on large-scale heterogeneous clusters. With HaoCL, existing applications can be directly deployed on heterogeneous clusters without any modifications to the original OpenCL source code and without awareness of the underlying hardware topologies and configurations. Our experiments show that HaoCL imposes a negligible overhead in a distributed environment, and provides near-linear speedups on standard benchmarks when computation or data size exceeds the capacity of a single node. The system design and the evaluations are presented in this demo paper.