COVID-19 has shown the importance of having a fast response against pandemics. Finding a novel drug is a very long and complex procedure, and it is possible to accelerate the preliminary phases by using computer simulations. In particular, virtual screening is an in-silico phase that is needed to filter a large set of possible drug candidates to a manageable number. This paper presents the implementations and a comparative analysis of two GPU-optimized implementations of a virtual screening algorithm targeting novel GPU architectures. The first adopts a traditional approach that spreads the computation required to evaluate a single molecule across the entire GPU. The second uses a batched approach that exploits the parallel architecture of the GPU to evaluate more molecules in parallel, without considering the latency to process a single molecule. The paper describes the advantages and disadvantages of the proposed solutions, highlighting implementation details that impact the performance. Experimental results highlight the different performance of the two methods on several target molecule databases while running on NVIDIA A100 GPUs. The two implementations have a strong dependency with respect to the data to be processed. For both cases, the performance is improving while reducing the dimension of the target molecules (number of atoms and rotatable bonds). The two methods demonstrated a different behavior with respect to the size of the molecule database to be screened. While the latency one reaches sooner (with fewer molecules) the performance plateau in terms of throughput, the batched one requires a larger set of molecules. However, the performances after the initial transient period are much higher (up to 5x speed-up). Finally, to check the efficiency of both implementations we deeply analyzed their workload characteristics using the instruction roof-line methodology.
翻译:COVID-19 显示了快速应对流行病的重要性。 找到新药是一个非常漫长和复杂的程序, 并且有可能通过计算机模拟来加快初步阶段。 特别是, 虚拟筛选是一个硅阶段, 需要将大量可能的药物候选者过滤到一个可控制的数量。 本文介绍了针对新型 GPU 结构的两种虚拟筛选算法的实施和比较分析。 第一种是传统的方法, 将评估整个 GPU 中单一分子所需的计算方法分散开来。 第二, 利用 GPU 的平行初始结构来平行评估更多的分子, 而不考虑处理一个单一分子的细度。 本文描述了拟议解决方案的利弊, 突出了影响性能的细节。 实验结果突出了两种方法在几个目标分子数据库中的不同性能, 而运行在 NVIDIA A100 GPUs 上。 两种执行方法对于要处理的数据有着很强的依赖性。 第二种是, 在两种情况下, 快速地, 快速地, 快速地, 显示其性能表现在一种水平上, 水平上, 显示一个是 。