项目名称: 基于广义部分线性单指标模型的高维纵向数据统计分析
项目编号: No.11501099
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 数理科学和化学
项目作者: 许佩蓉
作者单位: 上海师范大学
项目金额: 18万元
中文摘要: 高维纵向数据在社会学、医学等领域中经常出现,其最大的特点是数据之间具有相关性。因此,如何在考虑数据相关性的前提下对其进行统计分析一直是近20年来统计学研究的热点课题之一,具有十分重要的意义。本项目将重点研究高维以及超高维纵向数据下的广义部分线性单指标模型。首先,本项目拟从模型的识别性、估计的有效性和变量选择这三方面对高维纵向数据下的广义部分线性单指标模型进行研究,拟借鉴广义估计方程的思想提出估计方法并研究其有效性,进而提出能同时进行参数估计和变量选择的方法,证明变量选择的相合性,并通过数值模拟研究其有限样本性质;其次,本项目拟探讨超高维纵向数据下广义部分线性单指标模型的降维问题,构建单指标系数变量的筛选准则,给出筛选方法的大样本性质,并通过数值分析来评估其有限样本下的表现;最后,本项目拟结合上述高维和超高维两种情况下所提出的方法,提出两阶段的特征筛选和选择方法并应用于实际数据进行实证研究。
中文关键词: 纵向数据;单指标模型;高维数据分析;变量选择;广义估计方程
英文摘要: High-dimensional longitudinal data arise frequently in many fields such as social sciences and medical studies. In essence, the data set may be regarded as a collection of many time series, in which serial correlation exists inherently. Therefore, it is very meaningful to do statistical analysis in the presence of within-subject correlation, which is one of the most popular topics in statistics over the past two decades. This project will focus on studying the generalized partially linear single-index models with high-dimensional and ultrahigh dimensional longitudinal data. Firstly, we aim to study the model identification, the estimation efficiency and variable selection for generalized partially linear single-index models with high-dimensional longitudinal data. In the spirit of generalized estimating equations, we propose an estimation procedure and establish the estimation efficiency of the parametric part of the model. Further, we propose a variable selection procedure to do parameter estimation and variable selection simultaneously. The variable selection consistency will be given and simulation studies will be carried out to evaluate the finite sample performance. Secondly, we will study the dimension reduction problem for the generalized partially linear single-index models with ultrahigh dimensional longitudinal data. We propose a screening method for feature screening. We will prove its sure screening properties and assess its finite sample performance via numerical studies. Finally, we propose a two-stage screen and clean method by combining above feature screening and selection methods. And we will apply the proposed two-stage method for real data analyses.
英文关键词: longitudinal data;single-index model;high-dimensional data analysis;variable selection;generalized estimating equations