Ptychography is a popular microscopic imaging modality for many scientific discoveries and sets the record for highest image resolution. Unfortunately, the high image resolution for ptychographic reconstruction requires significant amount of memory and computations, forcing many applications to compromise their image resolution in exchange for a smaller memory footprint and a shorter reconstruction time. In this paper, we propose a novel image gradient decomposition method that significantly reduces the memory footprint for ptychographic reconstruction by tessellating image gradients and diffraction measurements into tiles. In addition, we propose a parallel image gradient decomposition method that enables asynchronous point-to-point communications and parallel pipelining with minimal overhead on a large number of GPUs. Our experiments on a Titanate material dataset (PbTiO3) with 16632 probe locations show that our Gradient Decomposition algorithm reduces memory footprint by 51 times. In addition, it achieves time-to-solution within 2.2 minutes by scaling to 4158 GPUs with a super-linear speedup at 364% efficiency. This performance is 2.7 times more memory efficient, 9 times more scalable and 86 times faster than the state-of-the-art algorithm.
翻译:在本文中,我们提出一种新的图像梯度分解方法,通过将图像梯度和分解测量信号传送到瓷砖中,大大降低图像重建的记忆足迹。此外,我们提议了一种平行图像梯度分解方法,使大量GPU能够进行不同步点对点通信和平行管状处理,同时使用少量的顶部。我们在Titanate材料数据集(PbTiO3)上进行的实验有16632个探测点,表明我们的梯度分解算法将记忆足迹减少51倍。此外,它通过将图像梯度和分解测量缩到瓷砖中,在2.2分钟内实现时间溶解,将图像梯度缩到4158GPUS,将超线性速度提高到364%。这一性能是记忆效率的2.7倍,比州级算法更快的9倍和86倍。