Ptychography is a popular microscopic imaging modality for many scientific discoveries and sets the record for highest image resolution. Unfortunately, the high image resolution for ptychographic reconstruction requires significant amount of memory and computations, forcing many applications to compromise their image resolution in exchange for a smaller memory footprint and a shorter reconstruction time. In this paper, we propose a novel image gradient decomposition method that significantly reduces the memory footprint for ptychographic reconstruction by tessellating image gradients and diffraction measurements into tiles. In addition, we propose a parallel image gradient decomposition method that enables asynchronous point-to-point communications and parallel pipelining with minimal overhead on a large number of GPUs. Our experiments on a Titanate material dataset (PbTiO3) with 16632 probe locations show that our Gradient Decomposition algorithm reduces memory footprint by 51 times. In addition, it achieves time-to-solution within 2.2 minutes by scaling to 4158 GPUs with a super-linear strong scaling efficiency at 364% compared to runtimes at 6 GPUs. This performance is 2.7 times more memory efficient, 9 times more scalable and 86 times faster than the state-of-the-art algorithm.
翻译:在本文中,我们提出了一种新的图像梯度分解方法,通过将图像梯度和分解测量信号传送到瓷砖中,大大缩短了音频重建的记忆足迹。此外,我们提议了一种平行图像梯度分解方法,使大量GPU的不同步点对点通信和平行管状内线能够实现无同步点对点通信和平行管状内线,而这种方法使大量GPUS的低端管理器能够进行大量内存和计算,迫使许多应用程序以较小的记忆足足迹和较短的重建时间来换取其图像解析。在本文中,我们提出了一种新型图像梯度分解方法,通过将图像梯度和分解测量测量结果传送到瓷砖中,大大缩短了音频重建的记忆足迹。此外,我们提出了一种平行的图像梯度梯度分解分解方法,使大量GPUPO能够进行不同步点通信和平行管道。我们在泰坦特材料数据集(PbtiO3)上进行的实验表明,我们的“梯度算法”分解算法算法将记忆足法将记忆足减少51倍。此外,比6GPUPPPPPP级的高度速度高出86倍,在2.7倍为9倍。