Learning from 3D biological macromolecules with artificial intelligence technologies has been an emerging area. Computational protein design, known as the inverse of protein structure prediction, aims to generate protein sequences that will fold into the defined structure. Analogous to protein design, RNA design is also an important topic in synthetic biology, which aims to generate RNA sequences by given structures. However, existing RNA design methods mainly focus on the secondary structure, ignoring the informative tertiary structure, which is commonly used in protein design. To explore the complex coupling between RNA sequence and 3D structure, we introduce an RNA tertiary structure modeling method to efficiently capture useful information from the 3D structure of RNA. For a fair comparison, we collect abundant RNA data and split the data according to tertiary structures. With the standard dataset, we conduct a benchmark by employing structure-based protein design approaches with our RNA tertiary structure modeling method. We believe our work will stimulate the future development of tertiary structure-based RNA design and bridge the gap between the RNA 3D structures and sequences.
翻译:从3D生物大型分子中学习人工智能技术的3D生物大型分子是一个新兴领域。称为蛋白质结构预测逆向预测的计算蛋白质设计旨在生成蛋白序列,这些序列将折合到确定的结构中。对蛋白设计来说,RNA设计也是合成生物学的一个重要课题,其目的是通过给定的结构生成RNA序列。然而,现有的RNA设计方法主要侧重于二级结构,忽略了在蛋白质设计中常用的、信息丰富的三级结构。为了探索RNA序列和3D结构之间的复杂结合,我们采用了RNA第三层结构建模方法,以便有效地从RNA的3D结构中获取有用信息。为了进行公平的比较,我们收集了大量RNA数据,并按照三级结构将数据分开。我们使用标准数据集,通过使用基于结构的蛋白设计方法与RNA第三层结构建模方法进行基准。我们相信,我们的工作将刺激基于高等结构的RNA设计的未来发展,并缩小RNA结构与序列之间的差距。