Pairwise sequence alignment is a very time-consuming step in common bioinformatics pipelines. Speeding up this step requires heuristics, efficient implementations, and/or hardware acceleration. A promising candidate for all of the above is the recently proposed GenASM algorithm. We identify and address three inefficiencies in the GenASM algorithm: it has a high amount of data movement, a large memory footprint, and does some unnecessary work. We propose Scrooge, a fast and memory-frugal genomic sequence aligner. Scrooge includes three novel algorithmic improvements which reduce the data movement, memory footprint, and the number of operations in the GenASM algorithm. We provide efficient open-source implementations of the Scrooge algorithm for CPUs and GPUs, which demonstrate the significant benefits of our algorithmic improvements. For long reads, the CPU version of Scrooge achieves a 20.1x, 1.7x, and 2.1x speedup over KSW2, Edlib, and a CPU implementation of GenASM, respectively. The GPU version of Scrooge achieves a 4.0x 80.4x, 6.8x, 12.6x and 5.9x speedup over the CPU version of Scrooge, KSW2, Edlib, Darwin-GPU, and a GPU implementation of GenASM, respectively. We estimate an ASIC implementation of Scrooge to use 3.6x less chip area and 2.1x less power than a GenASM ASIC while maintaining the same throughput. Further, we systematically analyze the throughput and accuracy behavior of GenASM and Scrooge under various configurations. As the best configuration of Scrooge depends on the computing platform, we make several observations that can help guide future implementations of Scrooge. Availability: https://github.com/CMU-SAFARI/Scrooge
翻译:基因组序列比对是常见生物信息学流程中耗时很长的步骤。加速该步骤需要启发式方法、高效实现和/或硬件加速。最近提出的GenASM算法是一种具有潜力的候选方案。我们确定并解决了GenASM算法中的三个低效问题:数据移动量大、内存消耗大且会执行一些不必要的工作。我们提出了Scrooge,一款快速且内存友好的基因组序列比对器。Scrooge包括三项新颖的算法改进,可以减少GenASM算法中的数据移动、内存占用和操作数。我们提供了针对CPU和GPU的高效开源实现,展示了算法改进带来的显著好处。对于长读取数据,Scrooge CPU版本相对于 KSW2、Edlib和GenASM CPU实现分别实现了20.1倍,1.7倍和2.1倍的加速。Scrooge GPU版本相对于Scrooge CPU版本、KSW2、Edlib、Darwin-GPU和GenASM GPU实现分别实现了4.0倍、80.4倍、6.8倍、12.6倍和5.9倍的加速。我们估计Scrooge ASIC实现的芯片面积和功耗都将比GenASM ASIC低3.6倍和2.1倍,而具有相同的吞吐量。此外,我们系统地分析了GenASM和Scrooge在不同配置下的吞吐量和精度行为。由于Scrooge的最佳配置取决于计算平台,因此我们提出了一些有助于指导未来Scrooge实现的观察结果。可访问性:https://github.com/CMU-SAFARI/Scrooge