设计高效深层学习的基因组基底计算器框架 (A Framework for Designing Efficient Deep Learning-Based Genomic Basecallers)

Nanopore sequencing generates noisy electrical signals that need to be converted into a standard string of DNA nucleotide bases (i.e., A, C, G, T) using a computational step called basecalling. The accuracy and speed of basecalling have critical implications for every subsequent step in genome analysis. Currently, basecallers are developed mainly based on deep learning techniques to provide high sequencing accuracy without considering the compute demands of such tools. We observe that state-of-the-art basecallers are slow, inefficient, and memory-hungry as researchers have adapted deep learning models from other domains without specialization to the basecalling purpose. Our goal is to make basecalling highly efficient and fast by building the first framework for specializing and optimizing machine learning-based basecaller. We introduce RUBICON, a framework to develop hardware-optimized basecallers. RUBICON consists of two novel machine-learning techniques that are specifically designed for basecalling. First, we introduce the first quantization-aware basecalling neural architecture search (QABAS) framework to specialize the basecalling neural network architecture for a given hardware acceleration platform while jointly exploring and finding the best bit-width precision for each neural network layer. Second, we develop SkipClip, the first technique to remove the skip connections present in modern basecallers to greatly reduce resource and storage requirements without any loss in basecalling accuracy. We demonstrate the benefits of QABAS and SkipClip by developing RUBICALL, the first hardware-optimized basecaller that performs fast and accurate basecalling. We show that QABAS and SkipClip can help researchers develop hardware-optimized basecallers that are superior to expert-designed models.

翻译：Nanopore 测序产生噪音的电信号,这些电信号需要转换成一个标准的DNA核核酸基系(即,A、C、G、T),使用称为基调的计算步骤。基调的准确性和速度对基因组分析的每一个后续步骤都有重要影响。目前,基调者主要基于深层次学习技术开发,以提供高测序精度,而不考虑这些工具的计算要求。我们观察到,最先进的基调呼叫器是缓慢、低效和记忆饥饿的,因为研究人员已经将其他域的深度学习模型(即,A、C、G、T)改造为基础呼叫目的。我们的目标是通过建立第一个专门的基础呼叫框架来高效和快速呼叫。我们引入了基调的准确性和速度。我们引入了一个框架来开发硬件优化基调测序的基调机学习技术。我们引入了第一个Questalizal-觉悟结构搜索(QAS)框架,以专门化精度直径直径直径直调的直径直径空基站网络架构,通过探索每个基调的基调的基调的基调的基调的基调平基调平基平基平基联结构结构结构结构结构结构结构结构结构结构来开发一个新的硬化和升级结构结构结构结构结构结构结构结构结构结构结构结构结构,以展示,我们可以进行测试的机械化。我们为基础平级平级平基级平基压的基压结构结构结构结构,为基础化技术,为最深研算模型,为基础平基压模型,为基础平基压结构结构结构结构压结构压结构图,为基础化技术,为基础化技术,为基础平基化,为基础平基压结构结构结构结构结构结构图,为基础平基化技术,为基础平基基化技术,为基础平基化,为基础平基化技术,为基础平基结构图,为基础平基化技术,为基础平基平基平基平基平基平基平基平基平基平基平基平基化,为基础平基平基平基平基平基结构结构结构结构结构结构图,为基础平基结构结构结构结构结构结构结构结构图,为基础平基结构结构结构结构结构结构结构结构结构图。