The extraction of sequence patterns from a collection of functionally linked unlabeled DNA sequences is known as DNA motif discovery, and it is a key task in computational biology. Several deep learning-based techniques have recently been introduced to address this issue. However, these algorithms can not be used in real-world situations because of the need for labeled data. Here, we presented RL-MD, a novel reinforcement learning based approach for DNA motif discovery task. RL-MD takes unlabelled data as input, employs a relative information-based method to evaluate each proposed motif, and utilizes these continuous evaluation results as the reward. The experiments show that RL-MD can identify high-quality motifs in real-world data.
翻译:从收集功能上连接的未贴标签的DNA序列中提取序列图案,称为脱氧核糖核酸质发现,这是计算生物学中的一项关键任务。最近采用了一些深层次的基于学习的技术来解决这一问题。然而,由于需要贴标签的数据,这些算法无法用于现实世界的情况。在这里,我们介绍了RL-MD,这是用于DNA脱氧核糖核酸发现任务的一种新型强化学习方法。RL-MD将未贴标签的数据作为输入,使用一种相对的信息基方法来评估每一个拟议的脱氧核酸,并利用这些持续评估的结果作为奖励。实验表明,RL-MD可以识别在现实世界数据中高质量的脱氧核糖核酸。