We now need more than ever to make genome analysis more intelligent. We need to read, analyze, and interpret our genomes not only quickly, but also accurately and efficiently enough to scale the analysis to population level. There currently exist major computational bottlenecks and inefficiencies throughout the entire genome analysis pipeline, because state-of-the-art genome sequencing technologies are still not able to read a genome in its entirety. We describe the ongoing journey in significantly improving the performance, accuracy, and efficiency of genome analysis using intelligent algorithms and hardware architectures. We explain state-of-the-art algorithmic methods and hardware-based acceleration approaches for each step of the genome analysis pipeline and provide experimental evaluations. Algorithmic approaches exploit the structure of the genome as well as the structure of the underlying hardware. Hardware-based acceleration approaches exploit specialized microarchitectures or various execution paradigms (e.g., processing inside or near memory) along with algorithmic changes, leading to new hardware/software co-designed systems. We conclude with a foreshadowing of future challenges, benefits, and research directions triggered by the development of both very low cost yet highly error prone new sequencing technologies and specialized hardware chips for genomics. We hope that these efforts and the challenges we discuss provide a foundation for future work in making genome analysis more intelligent. The analysis script and data used in our experimental evaluation are available at: https://github.com/CMU-SAFARI/Molecules2Variations
翻译:现在,我们比以往任何时候都更需要使基因组分析更加智能。我们需要解读、分析和解释我们的基因组,不仅快速,而且准确和高效地进行我们的基因组分析,以便把分析范围扩大到人口水平。目前整个基因组分析管道存在重大的计算瓶颈和低效率,因为最先进的基因组测序技术仍然无法读出整个基因组。我们描述目前利用智能算法和硬件结构大幅改进基因组分析的性能、准确性和效率的旅程。我们解释基因组分析管道每一步骤的最新算法方法和基于硬件的加速方法,并提供实验性评价。计算方法利用基因组的结构以及基本硬件的结构。基于硬件的加速方法利用专门的微结构或各种执行模式(例如内或近记忆内处理),同时进行算法变化,导致新的硬件/软软件共同设计系统。我们总结了未来挑战、效益和研究方向,因为开发非常低的成本但高度错误的基因组分析。 以硬件为基础的加速方法利用了基因组结构结构结构结构结构结构结构结构结构。 硬件加速方法利用专门的微缩图技术或硬件结构分析,我们利用了这些研究基础,为未来分析提供了新的智能脚本。