通过等级地貌倒退实现集群正规化 (Cluster Regularization via a Hierarchical Feature Regression)

Prediction tasks with high-dimensional nonorthogonal predictor sets pose a challenge for least squares based fitting procedures. A large and productive literature exists, discussing various regularized approaches to improving the out-of-sample robustness of parameter estimates. This paper proposes a novel cluster-based regularization - the hierarchical feature regression (HFR) -, which mobilizes insights from the domains of machine learning and graph theory to estimate parameters along a supervised hierarchical representation of the predictor set, shrinking parameters towards group targets. The method is innovative in its ability to estimate optimal compositions of predictor groups, as well as the group targets endogenously. The HFR can be viewed as a supervised factor regression, with the strength of shrinkage governed by a penalty on the extent of idiosyncratic variation captured in the fitting process. The method demonstrates good predictive accuracy and versatility, outperforming a panel of benchmark regularized estimators across a diverse set of simulated regression tasks, including dense, sparse and grouped data generating processes. An application to the prediction of economic growth is used to illustrate the HFR's effectiveness in an empirical setting, with favorable comparisons to several frequentist and Bayesian alternatives.

翻译：具有高维非正方位预测元件的预测性任务对基于最不平方的安装程序构成挑战。现有大量、有成果的文献,讨论了各种常规化方法,以改进参数估计的全模稳健性。本文件提出一种新的基于集群的正规化,即等级特征回归(HFR),它从机器学习和图形理论的领域中收集到的洞察力,以便根据所设定的预测元件的受监督等级代表对参数进行估计,缩小对群落目标的参数进行估计。这一方法具有创新性,因为它能够估计预测数组的最佳组成以及内部的组群目标。HFR可被视为一种受监督的因素回归。HFR可被视为一种受监管的因素回归力,其减缩的力度在适当过程中对独特性变化程度的处罚中受到约束。这种方法显示了良好的预测性准确性和多功能性,比一组由一组基准的标准化估算员组成的小组在一系列模拟回归任务中,包括密度、稀少和组群落数据生成过程。用于预测经济增长的应用程序用于说明HFR在实证环境中的有效性,同时对若干常见和巴基调的替代方法进行比较。