In recent years, the rapidly increasing number of reads produced by next-generation sequencing (NGS) technologies has driven the demand for efficient implementations of sequence alignments in bioinformatics. However, current state-of-the-art approaches are not able to leverage the massively parallel processing capabilities of modern GPUs with close-to-peak performance. We present AnySeq/GPU-a sequence alignment library that augments the AnySeq1 library with a novel approach for accelerating dynamic programming (DP) alignment on GPUs by minimizing memory accesses using warp shuffles and half-precision arithmetic. Our implementation is based on the AnyDSL compiler framework which allows for convenient zero-cost abstractions through guaranteed partial evaluation. We show that our approach achieves over 80% of the peak performance on both NVIDIA and AMD GPUs thereby outperforming the GPU-based alignment libraries AnySeq1, GASAL2, ADEPT, and NVBIO by a factor of at least 3.6 while achieving a median speedup of 19.2x over these tools across different alignment scenarios and sequence lengths when running on the same hardware. This leads to throughputs of up to 1.7 TCUPS (tera cell updates per second) on an NVIDIA GV100, up to 3.3 TCUPS with half-precision arithmetic on a single NVIDIA A100, and up to 3.8 TCUPS on an AMD MI100.
翻译:近些年来,由下一代排序技术(NGS)生成的读数迅速增加,促使人们要求高效实施生物信息学中的序列校正。然而,目前最先进的方法无法利用现代GPU的大规模平行处理能力,其性能接近峰值。我们展示了AnySeq/GPU-a序列校正校正库,该校正以新颖的方法增强 AnySeq1 图书馆,通过使用打乱和半精确算术将记忆存取量减少到最小,从而加速GPU的动态编程(DP)对GPU的调整。我们的实施以ANDSL编译框架为基础,该框架允许通过保证部分评价方便地进行零成本抽取。我们显示,我们的方法在NVDIA和AMDM GPU的顶峰值业绩中,80%以上比基于GPU的AS、GSAL2、AAPT和NVBIO的NBIO以至少3.6倍的系数,同时使这些工具的中位速度达到19.2x,在TSA VIA的SBA至SAL的SBSBSAL的平面上,在SAL的SAL平面的1至1至1至1至SIBA的1的1至1至1的1的1的半数。