项目名称: 广义线性模型的组变量选择及其在信用评分中的应用
项目编号: No.71471152
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 管理科学
项目作者: 方匡南
作者单位: 厦门大学
项目金额: 62万元
中文摘要: 近年来,管理科学、生物信息等领域产生了大量的高维数据,为模型选择带来了更大的挑战,且在某些实际问题中,自变量间由于某种内在关系,存在着自然的分组结构,此时,使用单变量选择方法忽略了分组结构中隐含的信息,可能会降低变量选择的性能,甚至会误选变量。鉴于此,本项目主要系统地研究广义线性模型的组变量选择方法,包括凹q范数组变量选择法、双层变量选择法、稀疏拉普拉斯组变量选择法。首先,研究仅能选择组变量的方法,提出凹q范数组变量选择法,解决其算法问题及证明其一致性;接着,研究既能选择组变量又能选择组内变量的双层变量选择法,解决其算法问题及证明其在单个变量和群组变量层面的Oracle性质;然后,在考虑变量间的网络结构下,提出拉普拉斯组变量选择法,利用扩展的GCD算法解决其计算问题,证明其在稀疏的Rieze条件下的oracle性质。最后,研究这些方法在信用评分中的应用。
中文关键词: 统计学;变量选择;广义线性模型;信用评分
英文摘要: In recent years, a mass of high dimensional data arises in the research fields such as management science, bioinformatics and others. The use of such data creates a big challenge for model selection. For some case, the inherent interconnection among covariates can be described with a grouping structure. In this case, individual variable selection methods, which omit the grouping structure information, may reduce the efficiency of variable selection, even lead to mis-selection. The main goal of this proposal is to systematically develop group variable selection for generalized linear regression, including concave q norm group selection methods, bi-level selection methods, Sparse Laplacian group selection methods. First, we propose new concave q norm group selection methods, which can identify important group covariates. Besides, we will give the algorithm and prove their consistency property. Moreover, we propose bi-level selection methods, which can identify not only important groups but also important covariates within selected groups, as well as give the computation solution and prove their oracle consistency property at both the group and within-group levels. Furthermore, we will propose Laplacian group selection methods, taking network structure among covariates into consideration; then we will extend the GCD algorithm for their computation and show that they have the oracle property under a sparse Rieze condition. Finally, we will apply these methods to credit scoring.
英文关键词: Statistics;Variable Selection;Generalized Linear Regression;Credit Scoring