论文标题

CUSZ:科学数据的有效的基于GPU的错误结合的损耗压缩框架

cuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data

论文作者

Tian, Jiannan, Di, Sheng, Zhao, Kai, Rivera, Cody, Fulp, Megan Hickman, Underwood, Robert, Jin, Sian, Liang, Xin, Calhoun, Jon, Tao, Dingwen, Cappello, Franck

论文摘要

遇到错误的损耗压缩是HPC应用程序的最新数据减少技术,因为它不仅可以显着降低储藏开销,而且可以保留高忠诚度以进行分析后。由于使用基于加速器的体系结构(尤其是GPU),超级计算机和HPC应用程序正在变得异质,因此几个开发团队最近发布了其有损压缩机的GPU版本。但是,现有的基于GPU的最新损耗压缩机患有低压和减压吞吐量或低压质量。在本文中,我们为最佳的有误损耗的压缩机SZ提供了优化的GPU版本CUSZ。据我们所知,Cusz是GPU上第一个用于科学数据的错误的有损压缩机。我们的贡献是四倍。 (1)我们提出了一个双重定量方案,以在SZ的预测步骤中完全删除数据依赖关系,以便可以在GPU上非常有效地执行此步骤。 (2)我们为GPU上的SZ压缩机开发了有效的自定义Huffman编码。 (3)我们使用CUDA实现CUSZ,并通过改善GPU内存带宽的利用来优化其性能。 (4)我们从科学数据降低基准测试基准中评估了五个现实世界中HPC应用程序数据集的CUSZ,并将其与CPU和GPU上的其他最新方法进行比较。实验表明,我们的CUSZ分别在单个和多个CPU内核上分别在生产版本上分别将SZ的压缩吞吐量分别提高了370.1倍和13.1倍,同时获得了相同的重建数据质量。与另一个最先进的GPU受支持的有损压缩机相比,在测试数据上,它的压缩比最多提高了3.48倍。

Error-bounded lossy compression is a state-of-the-art data reduction technique for HPC applications because it not only significantly reduces storage overhead but also can retain high fidelity for postanalysis. Because supercomputers and HPC applications are becoming heterogeneous using accelerator-based architectures, in particular GPUs, several development teams have recently released GPU versions of their lossy compressors. However, existing state-of-the-art GPU-based lossy compressors suffer from either low compression and decompression throughput or low compression quality. In this paper, we present an optimized GPU version, cuSZ, for one of the best error-bounded lossy compressors-SZ. To the best of our knowledge, cuSZ is the first error-bounded lossy compressor on GPUs for scientific data. Our contributions are fourfold. (1) We propose a dual-quantization scheme to entirely remove the data dependency in the prediction step of SZ such that this step can be performed very efficiently on GPUs. (2) We develop an efficient customized Huffman coding for the SZ compressor on GPUs. (3) We implement cuSZ using CUDA and optimize its performance by improving the utilization of GPU memory bandwidth. (4) We evaluate our cuSZ on five real-world HPC application datasets from the Scientific Data Reduction Benchmarks and compare it with other state-of-the-art methods on both CPUs and GPUs. Experiments show that our cuSZ improves SZ's compression throughput by up to 370.1x and 13.1x, respectively, over the production version running on single and multiple CPU cores, respectively, while getting the same quality of reconstructed data. It also improves the compression ratio by up to 3.48x on the tested data compared with another state-of-the-art GPU supported lossy compressor.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源