Today's scientific simulations require a significant reduction of data volume because of extremely large amounts of data they produce and the limited I/O bandwidth and storage space. Error-bounded lossy compressor has been considered one of the most effective solutions to the above problem. In practice, however, the best-fit compression method often needs to be customized/optimized in particular because of diverse characteristics in different datasets and various user requirements on the compression quality and performance. In this paper, we develop a novel modular, composable compression framework (namely SZ3), which involves three significant contributions. (1) SZ3 features a modular abstraction for the prediction-based compression framework such that the new compression modules can be plugged in easily. (2) SZ3 supports multialgorithm predictors and can automatically select the best-fit predictor for each data block based on the designed error estimation criterion. (3) SZ3 allows users to easily compose different compression pipelines on demand, such that both compression quality and performance can be significantly improved for their specific datasets and requirements. (4) In addition, we evaluate several lossy compressors composed from SZ3 using the real-world datasets. Specifically, we leverage SZ3 to improve the compression quality and performance for different use-cases, including GAMESS quantum chemistry dataset and Advanced Photon Source (APS) instrument dataset. Experiments show that our customized compression pipelines lead to up to 20% improvement in compression ratios under the same data distortion compared with the state-of-the-art approaches.
翻译:今天的科学模拟需要大量减少数据量,因为其产生的数据数量极多,而且I/O带宽和储存空间有限。错误造成的损耗压缩器被认为是解决上述问题的最有效办法之一。然而,在实践中,最合适的压缩方法往往需要定制/优化,特别是由于不同数据集的不同特点和用户对压缩质量和性能的各种要求。在本文件中,我们开发了一个新型模块化、可合成压缩框架(即SZ3),这涉及三大贡献。 (1) SZ3为基于预测的输油管压缩框架提供了一个模块式抽象,这样新的压缩模块可以很容易被插入。 (2) SZ3支持多数值预测器,并且可以自动为每个数据块选择最合适的预测器。(3) SZ3使用户能够方便地根据需求整合不同的压缩管道,这样,压缩质量和性能都可以大大改进它们的具体数据集和要求。(4) 此外,我们用SZ3 的流失压缩压缩压缩机组质量数据对SZ3 进行对比,并用真实的SZ3 和高级数据显示S-Rimal-S-S-BS-S-S-Sqmailal Stal Stal 数据。