项目名称: 高准度二代测序比对算法
项目编号: No.31501067
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 生物科学
项目作者: 王一
作者单位: 复旦大学
项目金额: 19万元
中文摘要: 二代测序是未来生命科学领域的基础性技术之一。二代测序的常规分析流程都离不开序列比对这一基础步骤。序列比对步骤的质量对二代测序数据分析结果起着关键性的作用。现有的比对算法存在着一定的比对假阴性率和假阳性率以及系统性偏差,影响了后续分析的可靠性。本课题将自行开发一套序列比对算法,以期达到低比对假阴性率和假阳性率以及系统偏差,同时兼顾比对速度。本课题拟在三方面开展工作:首先系统化回顾前人工作,提炼其共有框架和特色技术,然后充分利用长读长的优点,编写自主的比对算法,最后在模拟数据和真实数据上进行系统化测评,取得对该算法的正确评价以及实际使用经验。目前该课题已有一定基础框架,初步试验表明该算法可以降低比对错误,同时维持较高比对速度。后续研究将集中在算法速度提升和算法成熟化上,以期达到工业级别实际应用能力。
中文关键词: 人;计算模型;模拟;参数优化;软件开发
英文摘要: Second-generation sequencing technology is one of the fundamental technology in future life science. Conventional Second-generation sequencing analysis process is inseparable from the basic read alignment step. The quality of this step plays a key role in the quality of the final analysis results. Existing alignment algorithms have certain false-negative and false-positive rates as well as the systemic bias, compromising the reliability of the subsequent analysis. This study will develop its own sequence alignment algorithm to achieve lower false-negative and false-positive rates as well as lower systematic bias, taking into consideration of the speed. This study proposes to work in three steps: First, a systematic review of previous work and refine their consensus framework and technical features. Make full use of the advantages of a long read length, write its own alignment algorithms. And finally benchmark on the simulated data and real data, to get the proper evaluation of the algorithm and the application experience. At present, there are certain elementary framework of the subject, preliminary experiments show that the algorithm can reduce the error rate, while maintaining a high speed. Follow-up studies will focus on algorithms speed and algorithms maturing, in order to achieve industrial level application.
英文关键词: human;computational model;simulation;parameter optimization;software development