项目名称: 预测模型的结构化变量选择方法研究
项目编号: No.71301162
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 管理科学
项目作者: 李扬
作者单位: 中国人民大学
项目金额: 19万元
中文摘要: 变量选择是近年来预测模型构建的研究热点。前期研究表明,在构建模型时纳入自变量间分组及关联结构信息可以改进变量选择的效果,提升预测的准确性。从现有研究成果看,鲜有针对结构化变量选择中自变量的主效应与交互效应分层结构问题,数据异质性问题和样本非平衡问题的讨论,需要探索其在改善变量选择效果、提高预测精度上的作用。本项目以经济管理领域的预测模型中结构化变量选择方法为研究对象,围绕展开研究。针对每个问题,本项目分别从"有先验信息的结构化变量选择"和"无先验信息的结构化变量选择"两个角度研究变量选择模型的构建方法、参数估计及性质、算法优化设计与有效性评价,并讨论其在企业财务风险预警分析和信用风险潜在危险因素筛选研究上的应用。通过本项目的研究,拟提供一套可以广泛应用于经济学、社会学、管理学等领域的结构化变量选择方法,帮助实证研究者提高量化预测精度,进而帮助有关宏观管理部门或微观经济单位提高科学决策。
中文关键词: 结构化变量选择;强分层约束;异质性;不平衡;
英文摘要: High-throughput variable selection studies have been extensively conducted, searching for accurate predicting model with interpretable covariates. It has been demonstrated that variable selection methodologies with the modeling of covariate association tend to have better performance on both the variable selection and the prediction. In this study, we propose a theoretical discussion on the structured variable selection methodology, including the formulation of structured penalization, the parameter estimation and its properties, the computational algorithms, and the inference issues. Three main topics are involved for the structured variable selection: the hierarchy restriction on the main effects and interactions, the integrative analysis with heterogeneity data, and the cost-effective methods for the imbalanced data. With each of the topic, the variable selection method will be discussed for the informed covariate association and the uninformed covariate association separately. Two applied case study will be discussed in the research on the "Risk Predicting for Listed Companies with Financial Indicators" and the "Risk Factor Selection for the Credit Predicting" with the proposed methodologies. Aiming to improve the scientific decision-making process, the achievement of this study is a set of structured v
英文关键词: Structural variable selection;Strong hierarchical constrain;Heterogeneity;Imbalance;