项目名称: 基于深度学习与超CpG分割的人类全基因组差异甲基化研究
项目编号: No.61503061
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 自动化技术、计算机技术
项目作者: 凡时财
作者单位: 电子科技大学
项目金额: 19万元
中文摘要: DNA甲基化对基因组印记、干细胞分化以及疾病等生物进程具有重要的调控作用,定量分析多样本间的差异甲基化模式有助于理解组织差异性的机理以及为开发潜在的药物靶标提供依据。针对目前人类全基因组单碱基分辨率的DNA甲基化数据匮乏、能应用于多组织甲基化预测的计算方法以及相应的后续差异分析方法有待完善的现状,本项目拟研究:1)在传统的DNA序列相关特征基础上,结合450K甲基化芯片数据这一重要特征,引入深度学习的预测算法实现对不同样本的全基因组单碱基分辨率的DNA甲基化水平的预测;2)基于DNA甲基化水平的相似性以及序列位置的邻近关系,采用快速分水岭算法建立基于超CpG模型的序列分割算法;3)基于改进的Shannon熵算法识别差异甲基化区域组,融合聚类算法对典型差异甲基化组进行深入的功能分析;4)挖掘类风湿性关节炎中特异的超CpG甲基化区域组,通过实验验证与筛选发现新的Biomarker。
中文关键词: DNA差异甲基化;预测;深度学习;450K甲基化芯片数据;超CpG分割
英文摘要: DNA methylation plays essential regulatory roles in multiple cellular processes including genomic imprinting, stem cell differentiation and diseases. The quantification of differential methylation patterns across multiple samples will help to understand the role of DNA methylation in regulation of tissue specific gene expression, and may provide valuable references for potential drug targets. Currently, the whole genome DNA methylation data across different human tissues at the single-CpG resolution is largely lacking, and the available computational prediction methods as well as the differential methylation analytical algorithms still leave much to be desired. Based on that, we propose to carry out some specific research: 1) Predict the whole genome DNA methylation levels of multiple human samples at the single-CpG resolution with deep learning method, by combining the Illumina 450K DNA Methylation Beadchip data with some other traditionally used DNA sequence features; 2) Establish a super-CpG based DNA sequence segmentation model based on the methylation pattern similarity and location correlation, using fast watershed algorithm; 3) Identify differentially methylated region sets with improved shannon entropy approach, and perform functional analysis of typical differential methylated region sets with clustering method; 4) Study the specific differential super-CpG sets in Rheumatoid Arthritis, and discover some novel biomarkers by experimental validations.
英文关键词: Differential DNA Methylation;Prediction;Deep Learning;450K Beadchip Data;Super-CpG Segmentation