改进以预测为基础的损失压缩 (Improving Prediction-Based Lossy Compression Dramatically Via Ratio-Quality Modeling)

Error-bounded lossy compression is one of the most effective techniques for scientific data reduction. However, the traditional trial-and-error approach used to configure lossy compressors for finding the optimal trade-off between reconstructed data quality and compression ratio is prohibitively expensive. To resolve this issue, we develop a general-purpose analytical ratio-quality model based on the prediction-based lossy compression framework, which can effectively foresee the reduced data quality and compression ratio, as well as the impact of the lossy compressed data on post-hoc analysis quality. Our analytical model significantly improves the prediction-based lossy compression in three use-cases: (1) optimization of predictor by selecting the best-fit predictor; (2) memory compression with a target ratio; and (3) in-situ compression optimization by fine-grained error-bound tuning of various data partitions. We evaluate our analytical model on 10 scientific datasets, demonstrating its high accuracy (93.47% accuracy on average) and low computational cost (up to 18.7X lower than the trial-and-error approach) for estimating the compression ratio and the impact of lossy compression on post-hoc analysis quality. We also verified the high efficiency of our ratio-quality model using different applications across the three use-cases. In addition, the experiment demonstrates that our modeling based approach reduces the time to store the 3D Reverse Time Migration data by up to 3.4X over the traditional solution using 128 CPU cores from 8 compute nodes.

翻译：然而,用于配置损失压缩机以寻找重整数据质量和压缩率之间最佳权衡的最佳权衡的传统试错压缩压缩成本是极其昂贵的。为了解决这一问题,我们根据基于预测的损耗压缩框架开发了一个通用分析比率质量模型,该模型可以有效预测数据质量和压缩率的下降,以及损失压缩数据对热量分析后质量的影响。我们的分析模型大大改进了三个使用案例的预测性损耗压缩:(1) 通过选择最合适的预测器优化预测器;(2) 以目标比率优化存储器;(3) 通过微微微微的错误调整各种数据分区在现场优化压缩比例。我们评估了10个科学数据集的分析模型,表明其高度准确性(平均准确度为93.47 % ) 和低计算成本(比试验与erorg方法低18.7x ),用以估算压缩成本比率的预测,并用最合适的预测器进行最佳预测;(2) 用一个目标比率优化的记忆压缩;(3) 通过微量的调整,我们用基于不同质量的C 测试, 减少数据质量分析, 减少我们使用基于不同质量的C 的C 复制率分析。