高通量测序的可计算建模与应用基础算法

项目名称： 高通量测序的可计算建模与应用基础算法

项目编号： No.91530105

项目类型： 重大研究计划

立项/批准年度： 2016

项目学科： 数理科学和化学

项目作者： 李雷

作者单位： 中国科学院数学与系统科学研究院

项目金额： 25万元

中文摘要： 高通量DNA测序是开展现代分子生物学研究和实现个体化医疗的核心技术。在“高性能科学计算的基础算法与可计算建模”重大研究计划资助的培育项目的前期研究基础上，我们计划针对高通量测序技术中三个基础计算问题：碱基辨识、序列映射、和基因组拼接，建立原创的可计算模型和相应的应用基础算法。第一，继续发展基于Illumina技术的碱基辨识系统,方法采用盲反问题原则和分解复杂问题以实现并行运算，一方面减少错误，一方面提高速度。第二，基于我们原创的SEME算法，根据生物问题对映射速度、灵敏度、特异度的需求，通过概率计算设计相应的序列映射方案。第三，基因组拼接是计算生物学中的数学反问题，挑战主要来源于基因组中广泛存在的重复序列以及测序过程中各种误差造成的不确定性。我们的重点是研发互补于目前主流的单纯基于De Bruijn图的基因组拼接方案。我们的研究将会促进DNA测序为中国的健康、医学、农业等事业服务。

中文关键词： DNA测序；碱基辨识；序列映射；基因组拼接；疾病机制

英文摘要： High throughput sequencing is a key technology for molecular/genomic biology and personalized medicine. Based on our pilot project supported by the “High performance scientific computation: fundamental algorithms and computable modeling, we plan to conduct research in three basic computation problems in high throughput sequencing: base-calling, read mapping, and genome assembly. The project aims at developing original computable models and associated fundamental algorithms. The specific aims are as follows. First, we continue our effort to develop a base-calling system based on the Illumina technology. The major techniques include the blind inversion principle we developed and parallel computation via the decomposition of a complicated problem. Not only need we reduce base-calling errors, but also speed up computation. Second, based on the original SEME method we developed, design read-mapping algorithms that meet the requirement on speed, sensitivity, and specificity for a specific computational biology problem. Third, we view genome assembly as an inverse problem in computational biology. The challenge mainly lies in the uncertainty caused by widespread repetitive elements and all kinds of errors occurred in the sequencing process: library preparation, PCR amplification, instrument, imaging, etc. Our focus is

英文关键词： DNA sequencing；base calling；read mapping；genome assembly；disease mechanism

成为VIP会员查看完整内容