论文标题
通过将预测损耗压缩与HDF5的深层整合,并行写入加速平行写入
Accelerating Parallel Write via Deeply Integrating Predictive Lossy Compression with HDF5
论文作者
论文摘要
有损压缩是减少储存开销并提高HPC应用程序I/O性能的最有效解决方案之一。但是,由于缺乏对压缩 - 写入性能的深入了解,现有的平行I/O库无法完全利用有损压缩来加速并行写入。为此,我们建议将预测性损耗压缩与HDF5深入整合,以显着提高平行 - 写的性能。具体而言,我们提出了分析模型,以预测在实际压缩之前的压缩时间和并行写入时间,以实现压缩 - 写入重叠。我们还在过程中引入了一个额外的空间,以处理由压缩比预测不确定性引起的可能的数据溢出。此外,我们提出了一种优化,以重新排序压缩任务以提高重叠效率。 Summit的最多4,096个核心的实验表明,我们的解决方案分别在非压缩和有损压缩解决方案上提高了最高4.5倍和2.9倍,而在两个现实世界中,仅1.5%的存储开销(与原始数据相比)仅1.5%。
Lossy compression is one of the most efficient solutions to reduce storage overhead and improve I/O performance for HPC applications. However, existing parallel I/O libraries cannot fully utilize lossy compression to accelerate parallel write due to the lack of deep understanding on compression-write performance. To this end, we propose to deeply integrate predictive lossy compression with HDF5 to significantly improve the parallel-write performance. Specifically, we propose analytical models to predict the time of compression and parallel write before the actual compression to enable compression-write overlapping. We also introduce an extra space in the process to handle possible data overflows resulting from prediction uncertainty in compression ratios. Moreover, we propose an optimization to reorder the compression tasks to increase the overlapping efficiency. Experiments with up to 4,096 cores from Summit show that our solution improves the write performance by up to 4.5X and 2.9X over the non-compression and lossy compression solutions, respectively, with only 1.5% storage overhead (compared to original data) on two real-world HPC applications.