项目名称: 全基因组关联研究中基因-基因、基因-环境交互作用统计分析方法研究
项目编号: No.81473070
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 医药、卫生
项目作者: 陈峰
作者单位: 南京医科大学
项目金额: 80万元
中文摘要: 全基因组关联研究(GWAS)目前已经硕果累累。然而,基于单个位点或者一组位点主效应分析所检出的遗传位点仅能解释一小部分遗传变异。复杂疾病往往由多种外在因素(环境暴露)、内在因素(基因变异)相互作用导致,因此,基因-基因、基因-环境交互作用是复杂疾病不可忽视的重要形成因素!全基因组关联研究中,若忽视交互作用将导致遗传性缺失。然而,面对数十万个位点的数据,常规交互作用分析方法(如logistic 回归等)及中低维度数据挖掘方法(如随机森林等)受计算复杂度、运行速度限制,无法实现全基因组水平交互作用的检测。组学领域现有的高维数据交互作用方法仍存在统计算法不完善或计算速度不够快的缺陷。基于此,本课题拟改进现有一阶交互作用分析方法,并创新高阶交互作用方法和降维策略,控制假阳性、提高把握度;并利用计算机CPU、GPU 并行计算技术开发软件,软、硬件同时加速计算,使GWAS交互作用的分析成为常规方法。
中文关键词: 卫生统计;数据挖掘;统计方法;全基因组关联研究;交互作用
英文摘要: Despite the great success in identifying genes in genome-wide association study(GWAS), the single nucleotide polymorphisms (SNP) indentified through the single-SNP based approach or SNP set analysis only account for a small proportion of genetic variation. Complex diease is caused by muliple external factors (environmental exposure) and internal factors (genetic mutaion). Gene-environment interaction and gene-gene interaction may account for the missing heritability. Traditional methods for detecting interactions (logistic regression et, al.) in simple datasets or data mining approaches (random forest et, al.) in large-scale genetic datasets are no longer appropriate in GWAS datasets. Recently, many methods were proposed for detecting interactions in GWAS. However, they have obvious bugs in statistical algorithm or heavy computation burden. Based on these considerations, we aim to improve exsisted methods for detecting first-order interaction, proposed new methods and stratigies for detecting high-order interaction in GWAS. Furthermore, we will utilize parallel computing to speed up calculation based CPU/MPI or GPU/CUDA techniques. Additinally, the new proposed methods and softwares will be applied in real GWAS datasets to indentify gene-environment interactions and gene-gene interactions on genome-wide scale.
英文关键词: health statistics;data mining;statistical methods;GWAS;interaction effect