项目名称: 超高维生存数据变量筛选和选择中若干问题的研究
项目编号: No.11501573
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 数理科学和化学
项目作者: 陈晓林
作者单位: 曲阜师范大学
项目金额: 18万元
中文摘要: 分析超高维生存数据比较流行的做法是两阶段分析:首先,利用计算有效的筛选方法把维数减少到适当的水平;然后,通过更精细的惩罚方法进行同步变量选择和参数估计。由于删失和超高维的共同存在给统计推断带来的挑战,目前关于超高维生存数据变量筛选的研究还比较少。本项目拟研究一种基于L0稀疏约束估计的变量筛选策略及其具体的实施算法。不同于边际回归或相关性的方法,拟研究的方法能够自然地考虑到协变量的联合效应。在变量选择阶段,本项目拟研究基于seamless-L0和rLASSO惩罚函数的惩罚方法,并拟将相应方法推广到协变量具有交互效应时生存数据的变量选择问题。本项目在给出相应方法理论性质的同时,还将通过数值模拟比较所研究方法对现有方法的改进,并把研究成果应用到实际数据分析中。
中文关键词: 稀疏约束优化;超高维生存数据;惩罚;变量筛选;变量选择
英文摘要: To analyze the ultrahigh dimensional survival data, one appealing method is the two-stage approach. First, a computationally efficient screening method is applied to reduce the dimensionality to a moderate size, and then simultaneous variable selection and parameter estimation are achieved by the more elaborative penalized means. Due to the coexistence of censoring and ultrahigh dimensionality, the research about variable screening for survival data is very challenging and little. This project will study a method based on the L0 sparsity constraint estimator and the according implementation algorithm. Different from the existing marginal regression or correlation screening for ultrahigh dimensional survival data, the proposed procedure could naturally take the joint effects of covariates into consideration. At the stage of variable selection, this project will study the penalized methods via seamless-L0 and rLASSO penalty functions. Furthermore, the developed methods will be generalized to the variable selection for survival data with interactions. In addition to the theoretical properties, this project will also verify the improved performance of the proposed methods compared to the existing approaches under the finite samples, and apply the proposed methods to real data analysis.
英文关键词: sparsity-constrained optimization;ultrahigh dimensional survival data;penalization;variable screening;variable selection