项目名称: 基于新一代测序数据的非比对统计功效的研究
项目编号: No.11205061
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 物理学II
项目作者: 刘雪梅
作者单位: 华南理工大学
项目金额: 22万元
中文摘要: 转录因子结合位点的识别及基因水平转移的预测是目前生物学研究的核心问题。目前已有许多非比对统计方法来计算及用实验的方法来解决这两个问题。然而,在这些方法中,除过一些模拟研究以外,用统计值的统计功效方法来研究该问题尤其少,而统计分析的误差将影响到所构建演化树的可靠性。本项目将在基于非比对D2统计值的基础上展开以下研究。(1)建立背景序列为高阶马尔可夫过程的隐马尔可夫模型;通过Bernoulli分布建立一种转换模型来研究两种情况下统计功效的分布情况,并通过模拟给出两种理论结果和图解。(2)开发适用于NGS 数据比较的非比对方法并研究其统计功效,期望有个近似统计极值,其统计功效在序列长度趋于无穷大时快速接近于1。(3)通过研究统计值与演化距离的关系,构建演化树。该研究在同源序列分析、基因水平转移及系统演化树构造等领域具有一定的应用价值,可为生物学物种分类提供可靠的理论依据。
中文关键词: 非比对方法;隐马尔科夫模型;统计功效;模体;
英文摘要: The identification of transcription factor binding sites and detecting horizontally transferred genes between different organisms are the central problems in biological studies. Many computational and experimental methods have been developed to detect TFBS and HTG. However, the studies of the statistical power of these methods in detecting them are relatively rare except for some simulation studies. And the statistical analysis of the error will affect the reliability of the evolutionary tree constructed. We will carry out this study based on D2 statistic. (1) We will develop Background sequences with a high order Markov process using a hidden Markov model; By the Bernoulli distribution, we build an alternative model to study the power of the statistic under two situations by giving both theoretical results and illustrating them by simulations. (2) We will develop a new alignment-free sequence comparison based on NGS data and study their power. We hope that there is an approximate rate extremum, when the length of sequences tends to infinite the power close to 1. (3) From study the relation between statistic and evolutionary distance, we can construct evolutionary trees. The study in the homologous sequence analysis and the evolutionary tree constructed has a certain value and it can help to provide a reliable t
英文关键词: alignment-free comparison;Hidden Markov model;power;motif;