论文标题
探索用于分析MPI平行应用中自发异步性的技术
Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications
论文作者
论文摘要
本文研究了使用数据分析和机器学习技术来识别,分类和表征大规模平行(MPI)程序动态的实用性。为此,我们在两个不同的超级计算平台上使用常规的计算 - 沟通结构运行微型计算和现实的代理应用程序,并选择每个程序性能和每个时间步长的MPI时间步骤作为相关的可观察结果。使用主成分分析,聚类技术,相关函数和新的“相空间图”,我们显示了从比完整的MPI迹线小得多的数据集中可以轻松地识别出异步模式(或缺乏)。我们的方法还引导了对并行程序动态进行更一般分类的道路。
This paper studies the utility of using data analytics and machine learning techniques for identifying, classifying, and characterizing the dynamics of large-scale parallel (MPI) programs. To this end, we run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms and choose the per-process performance and MPI time per time step as relevant observables. Using principal component analysis, clustering techniques, correlation functions, and a new "phase space plot," we show how desynchronization patterns (or lack thereof) can be readily identified from a data set that is much smaller than a full MPI trace. Our methods also lead the way towards a more general classification of parallel program dynamics.