Error-bounded lossy compression is becoming more and more important to today's extreme-scale HPC applications because of the ever-increasing volume of data generated. Error-bounded lossy compressors have been widely used in in situ visualization, data stream intensity reduction, storage reduction, I/O performance improvement, checkpoint/restart acceleration, memory footprint reduction, etc. Although many works have optimized ratio, quality, and performance for different error-bounded lossy compressors, there is none of the existing works attempting to systematically understand the impact of lossy compression errors on HPC application due to error propagation. In this paper, we propose and develop a lossy compression fault injection tool, called LCFI. To the best of our knowledge, this is the first tool that help both lossy compressor developers and users to systematically and comprehensively understand the impact of lossy compression errors on any given HPC programs. The contributions of this work are threefold. (1) We propose an efficient approach to inject lossy compression faults according to a statistical analysis of compression errors for different state-of-the-art error-bounded lossy compressors. (2) We build a fault injector which is highly applicable, customizable, ease-to-use in generating top-down comprehensive results. We use a simple sample program to demonstrate the use of LCFI. (3) We evaluate LCFI on four representative HPC benchmark programs with different lossy compression errors and derive several important insights based on our observations.
翻译:由于生成的数据数量不断增加,对当今的极端规模的HPC应用中,错误的丢失压缩压缩机越来越越来越重要。 错误的压缩压缩机在现场视觉化、数据流强度降低、存储减少、I/O性能改进、检查/重新启动加速、记忆足迹减少等方面被广泛使用。 尽管许多工程优化了不同错误的丢失压缩机的比率、质量和性能,但没有任何一项现有工程试图系统地理解由于错误传播而丢失压缩错误对HPC应用应用的影响。在本文件中,我们提议并开发一个损失的压缩错误注射工具,称为LCFI。据我们所知,这是第一个帮助损失的压缩机开发商和用户系统和全面地理解错误对任何HPC程序的影响的工具。这项工作的贡献有三重。 (1) 我们建议一种高效的方法,根据对不同状态错误的错误传播对压缩错误错误的压缩错误进行统计分析。 (3)我们建议开发一个损失压缩错误的压缩错误注入工具,称为LCFIFI的压缩机率全面应用。 我们用一个系统化的缩略图,在标准中,在高标准中,我们用一个基础程序上,在生成了一种高压式的缩缩缩的压缩结果。