We propose a new prediction method for multivariate linear regression problems where the number of features is less than the sample size but the number of outcomes is extremely large. Many popular procedures, such as penalized regression procedures, require parameter tuning that is computationally untenable in such large-scale problems. We take a different approach, motivated by ideas from simultaneous estimation problems, that performs linear shrinkage on ordinary least squares parameter estimates. Our approach is extremely computationally efficient and tuning-free. We show that it can asymptotically outperform ordinary least squares without any structural assumptions on the true regression coefficients and illustrate its good performance in simulations and an analysis of single-cell RNA-seq data.
翻译:我们为多变量线性回归问题提出了一个新的预测方法,其中地物数量少于抽样规模,但结果数量却非常大。许多流行的程序,例如惩罚性回归程序,要求进行参数调整,而这种调整在如此大规模的问题中是计算上站不住脚的。我们采取不同的方法,其动机是同时估算问题的想法,对普通最小方块参数估计进行线性缩小。我们的方法在计算上效率极高,而且没有调试。我们表明,在没有关于真实回归系数的任何结构性假设的情况下,它可以无一例外地优于普通最小方形,并表明它在模拟和分析单细胞RNA-Seq数据方面的良好表现。