Crystallography is the leading technique to study atomic structures of proteins and produces enormous volumes of information that can place strains on the storage and data transfer capabilities of synchrotron and free-electron laser light sources. Lossy compression has been identified as a possible means to cope with the growing data volumes; however, prior approaches have not produced sufficient quality at a sufficient rate to meet scientific needs. This paper presents Region Of Interest BINning with SZ lossy compression (ROIBIN-SZ) a novel, parallel, and accelerated compression scheme that separates the dynamically selected preservation of key regions with lossy compression of background information. We perform and present an extensive evaluation of the performance and quality results made by the co-design of this compression scheme. We can achieve up to a 196x and 46.44x compression ratio on lysozyme and selenobiotinyl-streptavidin while preserving the data sufficiently to reconstruct the structure at bandwidths and scales that approach the needs of the upcoming light sources
翻译:晶晶学是研究蛋白原子结构并产生大量信息,对同步器和自由电子激光光源的储存和数据传输能力造成压力的巨量信息的主要技术。损失压缩已被确定为应付数据数量不断增加的可能手段;然而,以前的办法未能产生足够质量,其质量不足以满足科学需要。本文介绍了一个创新的、平行的和加速的压缩计划,它把动态选择的关键区域的保存和背景信息压缩损失的动态选择分开。我们对这一联合设计压缩计划的业绩和质量结果进行广泛评估。我们可以达到196x和46.444x的淋巴和血清丁基-电流压率,同时保留足够数据,以重建频带和规模的结构,满足即将出现的光源的需求。