DNA sequence classification is a fundamental task in computational biology with vast implications for applications such as disease prevention and drug design. Therefore, fast high-quality sequence classifiers are significantly important. This paper introduces ClaPIM, a scalable DNA sequence classification architecture based on the emerging concept of hybrid in-crossbar and near-crossbar memristive processing-in-memory (PIM). We enable efficient and high-quality classification by uniting the filter and search stages within a single algorithm. Specifically, we propose a custom filtering technique that drastically narrows the search space and a search approach that facilitates approximate string matching through a distance function. ClaPIM is the first PIM architecture for scalable approximate string matching that benefits from the high density of memristive crossbar arrays and the massive computational parallelism of PIM. Compared with Kraken2, a state-of-the-art software classifier, ClaPIM provides significantly higher classification quality (up to 20x improvement in F1 score) and also demonstrates a 1.8x throughput improvement. Compared with EDAM, a recently-proposed SRAM-based accelerator that is restricted to small datasets, we observe both a 30.4x improvement in normalized throughput per area and a 7% increase in classification precision.
翻译:DNA序列分类是计算生物学中的一项基本任务,对疾病预防和药物设计等应用具有广泛影响。 因此,快速高质量序列分类非常重要。 本文引入了ClaPIM, 这是一种可缩放的DNA序列分类结构, 其基础是新兴的跨条和近跨条中间处理模质混合概念。 我们通过将过滤器和搜索阶段合并在一个单一算法中, 实现了高效和高质量的分类。 具体地说, 我们提出一种定制过滤技术, 大幅缩小搜索空间和搜索方法, 便利通过距离函数进行近似字符串匹配。 ClaPIM 是第一个可缩放近的近似字符串匹配的PIM结构, 其基础是高密度的中间横条形阵列和PIM的大规模计算平行。 与Kraken2相比, 一种最先进的软件分类系统, ClaPIM 提供显著更高的分类质量( F1 评分达20x的改进), 并展示了1.8x 的流程改进。 与 EDAM 相比, 最近推出的 SRAM- 30x 的精确度区域的升级, 限制我们通过 的校正 。