As supercomputers continue to grow to exascale, the amount of data that needs to be saved or transmitted is exploding. To this end, many previous works have studied using error-bounded lossy compressors to reduce the data size and improve the I/O performance. However, little work has been done for effectively offloading lossy compression onto FPGA-based SmartNICs to reduce the compression overhead. In this paper, we propose a hardware-algorithm co-design of efficient and adaptive lossy compressor for scientific data on FPGAs (called CEAZ) to accelerate parallel I/O. Our contribution is fourfold: (1) We propose an efficient Huffman coding approach that can adaptively update Huffman codewords online based on codewords generated offline (from a variety of representative scientific datasets). (2) We derive a theoretical analysis to support a precise control of compression ratio under an error-bounded compression mode, enabling accurate offline Huffman codewords generation. This also helps us create a fixed-ratio compression mode for consistent throughput. (3) We develop an efficient compression pipeline by adopting cuSZ's dual-quantization algorithm to our hardware use case. (4) We evaluate CEAZ on five real-world datasets with both a single FPGA board and 128 nodes from Bridges-2 supercomputer. Experiments show that CEAZ outperforms the second-best FPGA-based lossy compressor by 2X of throughput and 9.6X of compression ratio. It also improves MPI_File_write and MPI_Gather throughputs by up to 25.8X and 24.8X, respectively.
翻译:由于超级计算机继续增长到缩放, 需要保存或传输的数据数量正在爆炸。 为此, 许多先前的工程已经研究过使用错误的丢失压缩机来降低数据大小, 并改进 I/ O 性能。 然而, 将损失压缩有效卸下到基于 FPGA 的智能计算机以降低压缩管理。 在本文中, 我们提议为 FPGAs (称为 CEAZ) 的科学数据预设一个高效和适应性损失压缩压缩机共同设计一个高效和适应性损失压缩机, 以加速平行的 I/ O 。 我们的贡献是四倍:(1) 我们建议一种高效的Huffman 编码方法, 能够根据不在线生成的编码( 各种具有代表性的科学数据集) 来适应性更新Huffman 代码。 (2) 我们从理论分析中支持在受错误限制的压缩模式下对压缩比率进行精确的控制, 使精密的Hufferal- 2 liferal commal- blickral- dislational- C. 我们用一个高效的C- X 和C- hal- hal- hal- hal- hal- dal- bal- bal- bal- bal- bal- bus a by a 和 C- bus ax ax a by ax ax a 和 C.