可缩缩缩和准确的多GPU-基于多GPU的图像重建大规模绘图学数据 (Scalable and accurate multi-GPU based image reconstruction of large-scale ptychography data)

While the advances in synchrotron light sources, together with the development of focusing optics and detectors, allow nanoscale ptychographic imaging of materials and biological specimens, the corresponding experiments can yield terabyte-scale large volumes of data that can impose a heavy burden on the computing platform. While Graphical Processing Units (GPUs) provide high performance for such large-scale ptychography datasets, a single GPU is typically insufficient for analysis and reconstruction. Several existing works have considered leveraging multiple GPUs to accelerate the ptychographic reconstruction. However, they utilize only Message Passing Interface (MPI) to handle the communications between GPUs. It poses inefficiency for the configuration that has multiple GPUs in a single node, especially while processing a single large projection, since it provides no optimizations to handle the heterogeneous GPU interconnections containing both low-speed links, e.g., PCIe, and high-speed links, e.g., NVLink. In this paper, we provide a multi-GPU implementation that can effectively solve large-scale ptychographic reconstruction problem with optimized performance on intra-node multi-GPU. We focus on the conventional maximum-likelihood reconstruction problem using conjugate-gradient (CG) for the solution and propose a novel hybrid parallelization model to address the performance bottlenecks in CG solver. Accordingly, we develop a tool called PtyGer (Ptychographic GPU(multiple)-based reconstruction), implementing our hybrid parallelization model design. The comprehensive evaluation verifies that PtyGer can fully preserve the original algorithm's accuracy while achieving outstanding intra-node GPU scalability.

翻译：同步光源的进步,加上光学和探测器的开发,使得纳米规模的材料和生物标本的光学成像成像和探测器得以进行纳米规模的光学成像,而相应的实验则能够产生对计算平台造成沉重负担的巨型大量数据。虽然图形处理单位(GPU)为这种大型的脉冲数据集提供了高性能,但单一的GPU通常不足以用于分析和重建。一些现有的工程考虑利用多个GPU加速结构重建。然而,它们只利用信息传递接口(MPI)处理GPU之间的通信。对于在单一节点中拥有多个GPUPU的配置来说,效率低下,特别是处理一个单一大预测,因为它没有提供优化处理包含低速度连接的混杂 GPUPU连接和高速连接,例如,NVLink 模型。在本文中,我们提供多种GPUPU实施,能够有效地解决大规模脉冲传输接口接口接口接口接口接口接口的接口,从而维护了整个配置设计效率,特别是在处理一个单一的GPUIFIL的硬化系统重建过程中, 将GG(我们OUI-PUD) 优化的硬化的硬化的硬化的硬化的硬化的硬化系统重建,在GG 的硬化的硬化变化的硬化的硬化的硬化的硬化,在G 模拟的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化方法上,在GGFI的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化的硬化。