通过微粒率-质量建模对宇宙学模拟的 " Situ Lossy压缩 " 的适应性配置 (Adaptive Configuration of In Situ Lossy Compression for Cosmology Simulations via Fine-Grained Rate-Quality Modeling)

Extreme-scale cosmological simulations have been widely used by today's researchers and scientists on leadership supercomputers. A new generation of error-bounded lossy compressors has been used in workflows to reduce storage requirements and minimize the impact of throughput limitations while saving large snapshots of high-fidelity data for post-hoc analysis. In this paper, we propose to adaptively provide compression configurations to compute partitions of cosmological simulations with newly designed post-analysis aware rate-quality modeling. The contribution is fourfold: (1) We propose a novel adaptive approach to select feasible error bounds for different partitions, showing the possibility and efficiency of adaptively configuring lossy compression for each partition individually. (2) We build models to estimate the overall loss of post-analysis result due to lossy compression and to estimate compression ratio, based on the property of each partition. (3) We develop an efficient optimization guideline to determine the best-fit configuration of error bounds combination in order to maximize the compression ratio under acceptable post-analysis quality loss. (4) Our approach introduces negligible overheads for feature extraction and error-bound optimization for each partition, enabling post-analysis-aware in situ lossy compression for cosmological simulations. Experiments show that our proposed models are highly accurate and reliable. Our fine-grained adaptive configuration approach improves the compression ratio of up to 73% on the tested datasets with the same post-analysis distortion with only 1% performance overhead.

翻译：当今的研究人员和科学家在超级计算机上广泛使用了极端规模的宇宙模拟。在工作流程中,使用了新一代因错误而错错失的压缩压缩机,以减少储存要求和尽量减少吞吐量限制的影响,同时保存大量高纤维数据快照,供热后分析使用。在本文件中,我们提议根据每个分区的属性,提供适应性化压缩配置,以计算宇宙模拟的分区,并采用新设计的分析后有意识的费率质量模型。贡献有四倍:(1) 我们提出一种新的适应性办法,为不同分区选择可行的误差界限,显示每个分区的适应性调整损失压缩率的可能性和效率。 (2) 我们根据每个分区的属性,建立模型,估计分析后分析结果的总体损失。 (3) 我们制定高效优化准则,以确定错误组合的最佳配置,以便在可接受的分析后质量损失中最大限度地增加压缩率。 (4) 我们的方法为每个分区的特征提取和误差调整率调整率的可能性和效率,使后分析结果的总体损失估计基于损失率的准确性能,使我们的精确性能调整模型在高比例上显示我们所拟的精确的精确性平局性压模型。