论文标题

为可延展的MPI应用程序设计自适应应用程序级检查点管理系统

Designing an Adaptive Application-Level Checkpoint Management System for Malleable MPI Applications

论文作者

John, Jophin, Gerndt, Michael

论文摘要

动态资源管理为高性能计算开辟了许多机会。它改善了系统级服务以及应用程序性能。检查点也可以被视为系统级服务,并可以从动态上获得的好处。通过与可延展的资源管理系统集成,检查点系统可以具有更好的资源可用性。除了容忍度外,检查点系统还可以满足资源更改期间可延展应用程序的数据重新分布需求。因此,我们提出了一种自适应应用程序级检查点管理系统Icheck,它可以有效地利用系统和应用程序级别的动态,以提供更好的检查点和数据重新分布服务。

Dynamic resource management opens up numerous opportunities in High Performance Computing. It improves the system-level services as well as application performance. Checkpointing can also be deemed as a system-level service and can reap the benefits offered by dynamism. A checkpointing system can have better resource availability by integrating with a malleable resource management system. In addition to fault tolerance, the checkpointing system can cater to the data redistribution demand of malleable applications during resource change. Therefore, we propose iCheck, an adaptive application-level checkpoint management system that can efficiently utilize the system and application level dynamism to provide better checkpointing and data redistribution services to applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源