项目名称: 基于基因水平的全基因组关联研究统计方法研究
项目编号: No.81202283
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 预防医学、地方病学、职业病学、放射医学
项目作者: 易洪刚
作者单位: 南京医科大学
项目金额: 23万元
中文摘要: 全基因组关联研究(GWAS)目前已经成为复杂性疾病致病机制研究最主要的手段。近年来,GWAS取得了骄人的成绩。然而,面对海量的数据,现有统计方法和分析策略存在许多统计学问题,已明显落后于实际需求,无法深入挖掘GWAS 数据中蕴含的丰富信息。本研究探讨基于基因水平的全基因组关联研究中的统计方法和分析策略,包括:首先利用先验生物学信息进行降维;其次基于基因水平采用稀疏偏最小二乘回归、惩罚回归模型、机器学习等方法进行重要SNPs筛选;最后采用logistic核函数回归模型和各种主成分回归模型等多位点分析模型在基因水平对多个SNPs的联合作用进行统计分析。
中文关键词: 全基因组关联研究;多水平模型;先验信息;统计策略;降维
英文摘要: To date,the genome-wide association studies (GWAS) have become the most important methods in studying the pathogenic mechanism of complex diseases.For the past few years,GWAS has achieved great success. However, facing the mass of data, the existing statistical methods and analyzing strategies have many statistical problems and greatly behind the actual demand. They haven't high efficiency to dig GWAS data contains rich information. The purpose of this study is to explore statistical methods and strategies of GWAS based on gene-level analysis. This study mainly contains as follows: First,we evaluate the methods of dimension reduction with prior biological information, and then screen the important variables with several statistical methods based on gene-level,such as sparse partial least-square regression, penalized logistic regression and machine learning,etc. Finally, gene-based kernel-based logistic regression models and various kinds of principal component regression models are used to evaluate the joint effect of multilocus SNPs.
英文关键词: Genome-wide association study;hierarchical model;prior information;statistical strategy;dimension reduction