论文标题
朱莉娅的高性能XPU模型计算
High-performance xPU Stencil Computations in Julia
论文作者
论文摘要
我们提出了一种有效的方法,用于在朱莉娅(Julia)中撰写架构 - 不合时宜的高性能模板计算,该计算是在包装Parallelstencile.jl中实例化的。功能强大的元编程,无性化的抽象和多次调度使编写单个代码,该代码适用于单个CPU线程上的生产原型,并且生产在多GPU或CPU工作站或超级计算机上运行。我们证明了与3D热扩散求解器的GPU上理论上界接近的性能,这是通过CUDA.JL阵列编程的可触及性能的巨大改进。
We present an efficient approach for writing architecture-agnostic parallel high-performance stencil computations in Julia, which is instantiated in the package ParallelStencil.jl. Powerful metaprogramming, costless abstractions and multiple dispatch enable writing a single code that is suitable for both productive prototyping on a single CPU thread and production runs on multi-GPU or CPU workstations or supercomputers. We demonstrate performance close to the theoretical upper bound on GPUs for a 3-D heat diffusion solver, which is a massive improvement over reachable performance with CUDA.jl Array programming.
