Modern high-dimensional methods often adopt the "bet on sparsity" principle, while in supervised multivariate learning statisticians may face "dense" problems with a large number of nonzero coefficients. This paper proposes a novel clustered reduced-rank learning (CRL) framework that imposes two joint matrix regularizations to automatically group the features in constructing predictive factors. CRL is more interpretable than low-rank modeling and relaxes the stringent sparsity assumption in variable selection. In this paper, new information-theoretical limits are presented to reveal the intrinsic cost of seeking for clusters, as well as the blessing from dimensionality in multivariate learning. Moreover, an efficient optimization algorithm is developed, which performs subspace learning and clustering with guaranteed convergence. The obtained fixed-point estimators, though not necessarily globally optimal, enjoy the desired statistical accuracy beyond the standard likelihood setup under some regularity conditions. Moreover, a new kind of information criterion, as well as its scale-free form, is proposed for cluster and rank selection, and has a rigorous theoretical support without assuming an infinite sample size. Extensive simulations and real-data experiments demonstrate the statistical accuracy and interpretability of the proposed method.
翻译:现代高维方法往往采用“偏狭”原则,而在受监督的多变量学习统计人员可能面临大量非零系数的“严谨”问题。本文建议采用一个新的分组式低级学习框架(CRL),规定两个联合矩阵正规化,以自动组合构建预测因素的特征。CRL比低级建模更容易解释,并放松变量选择中严格的宽度假设。在本文中,提出了新的信息理论限制,以揭示寻求集群的内在成本以及多变量学习的多元性带来的喜悦。此外,还制定了高效优化算法,进行次空间学习和集群,保证会趋同。获得的固定点估计者虽然不一定全球最佳,但在某些常规条件下,享有超出标准可能性的预期统计准确性。此外,为集群和级别选择提出了一种新的信息标准及其无规模形式,并具有严格的理论支持,而不必假定无限的抽样规模。广泛模拟和真实数据实验显示了提议的统计准确性和可解释性方法。