Multi-task learning is increasingly used to investigate the association structure between multiple responses and a single set of predictor variables in many applications. In the era of big data, the coexistence of incomplete outcomes, large number of responses, and high dimensionality in predictors poses unprecedented challenges in estimation, prediction, and computation. In this paper, we propose a scalable and computationally efficient procedure, called PEER, for large-scale multi-response regression with incomplete outcomes, where both the numbers of responses and predictors can be high-dimensional. Motivated by sparse factor regression, we convert the multi-response regression into a set of univariate-response regressions, which can be efficiently implemented in parallel. Under some mild regularity conditions, we show that PEER enjoys nice sampling properties including consistency in estimation, prediction, and variable selection. Extensive simulation studies show that our proposal compares favorably with several existing methods in estimation accuracy, variable selection, and computation efficiency.
翻译:多任务学习越来越多地用于调查多种反应和多种应用中单一的预测变量之间的关联结构。在大数据时代,不完全的结果、大量反应和高维度的预测数据共存,在估计、预测和计算方面构成了前所未有的挑战。在本文中,我们提出了一个可缩放和计算高效的程序,称为PEER,用于大规模多反应回归,其结果不完全,反应和预测器的数量都可以是高维的。在微弱因素回归的推动下,我们将多反应回归转换成一套单一反应回归,可以同时有效地同时实施。在某些轻微的常规条件下,我们表明PEER拥有良好的抽样特性,包括估算、预测和可变选择的一致性。广泛的模拟研究表明,我们的建议与在估计准确性、变量选择和计算效率方面现有的几种方法相比是优异的。