项目名称: 复杂纵向数据的分位回归建模及其在生物医学大数据中的应用
项目编号: No.11501167
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 数理科学和化学
项目作者: 田玉柱
作者单位: 河南科技大学
项目金额: 18万元
中文摘要: 在生物医学和流行病学等领域的纵向数据建模中,由于实际问题的复杂性,常常会碰到诸如,测量误差,删失,非正态,有偏以及高维等复杂数据情形.常见的分析方法是基于组内误差和组间随机效应项的正态分布假定来对响应变量的均值特征进行建模. 然而,这种建模方法对异常数据和极端数据表现敏感,所得估计往往带有较大偏差,也无法获知响应变量的其他条件分布信息.本项目主要运用现代分位回归方法来对上述复杂纵向数据进行建模,我们将基于参数混合效应模型及半参数混合效应模型来捕捉数据的特征,并灵活使用损失函数方法,联合分层似然方法以及Bayes方法等建模思想进行统计推断.并基于所研究的方法,来分析医学艾滋病数据集,以获得比传统均值回归方法更为精细的生物医学信息.同时,基于高维医学大数据分析的需要,结合目前流行的多种惩罚变量选择方法,筛选出对医学响应指标有显著影响的解释变量和更为节俭的分析模型,为实际医学工作者提供参考.
中文关键词: 纵向数据;混合效应模型;分位回归;生物医学大数据;模型选择
英文摘要: In longitudinal data modeling in the fields of biomedicine and epidemiology, due to the complexity of the actual problems, we often encounter some complex data, such as measurement error, censoring, non-normal, biased and high-dimensional data. Common analysis method aims to model the conditional mean feature of the response variable under the assumption of normal distributions on random effects and error terms. However, this modeling method performs sensitive with outliers and extreme values and the estimated results are generally biased with large deviation. Except the average characteristics of the response variable, it can not capture other conditional distributional information. This project mainly plan to model the above complex longitudinal data by using modern quantile regression method. Specifically, we will utilize the parameteric mixed effect model or semeparametric mixed effect model to capture the characteristics of the data, and flexiblely use quantile check function method, joint hierarchical likelihood method and Bayesian modelling ideas for conducting statistical inference. At the same time, based on the considered modelling methods, we will analyze the medical AIDS data sets to present more complete biomedical information than the traditional mean regression. In addition, for addressing the need of high-dimensional medical big data analysis, we make good use of the epidemic penalty variable selection methods to find significant explanatory covariates and more frugal models for interested medical responses. These works may provide more references for actual medical workers.
英文关键词: Longitudinal data;Mixed effects models;Quantile regression;Biomedical big data;Model selection