SPIDER-WEB提高了编码算法的容错能力和实时信息检索能力 (SPIDER-WEB generates coding algorithms with superior error tolerance and real-time information retrieval capacity)

DNA has been considered a promising medium for storing digital information. As an essential step in the DNA-based data storage workflow, coding algorithms are responsible to implement functions including bit-to-base transcoding, error correction, etc. In previous studies, these functions are normally realized by introducing multiple algorithms. Here, we report a graph-based architecture, named SPIDER-WEB, providing an all-in-one coding solution by generating customized algorithms automatically. SPIDERWEB is able to correct a maximum of 4% edit errors in the DNA sequences including substitution and insertion/deletion (indel), with only 5.5% redundant symbols. Since no DNA sequence pretreatment is required for the correcting and decoding processes, SPIDER-WEB offers the function of real-time information retrieval, which is 305.08 times faster than the speed of single-molecule sequencing techniques. Our retrieval process can improve 2 orders of magnitude faster compared to the conventional one under megabyte-level data and can be scalable to fit exabyte-level data. Therefore, SPIDER-WEB holds the potential to improve the practicability in large-scale data storage applications.

翻译：DNA 被认为是一种储存数字信息的有前途的媒介。在DNA基础数据存储流程中，编码算法是实现位于基因和数据之间的功能的关键，包括将二进制转换成碱基、错误校正等。在之前的研究中，这些功能通常是通过引入多种算法来实现的。本文报告了一种图形化架构，名为SPIDER-WEB，提供了一种全方位的编码解决方案，可自动生成定制算法。SPIDER-WEB能够校正DNA序列中的最大4%的编辑错误（包括替换、插入/删除（indel）），仅使用了5.5%的冗余符号。由于校正和解码过程不需要DNA序列预处理，SPIDER-WEB提供了实时信息检索功能，其速度比单分子测序技术快305.08倍。我们的检索过程在兆字节级别的数据下可以比传统检索方式快2个数量级，并可扩展到拟解决艾克斯特字节级别数据。因此，SPIDER-WEB有望提高大规模数据存储应用的实用性。

相关内容

网络爬虫

关注 13

网络爬虫（又被称为网页蜘蛛，网络机器人，在FOAF社区中间，更经常被称为网页追逐者），是一种按照一定的规则，自动的抓取万维网信息的程序或者脚本，已被广泛应用于互联网领域。搜索引擎使用网络爬虫抓取Web网页、文档甚至图片、音频、视频等资源，通过相应的索引技术组织这些信息，提供给搜索用户进行查询。网络爬虫也为中小站点的推广提供了有效的途径。

【伯克利博士论文】机器人机械搜索的操作与感知策略

专知会员服务

16+阅读 · 2022年6月4日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

16+阅读 · 2020年4月28日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日