We introduce a new method for two-sample testing of high-dimensional linear regression coefficients without assuming that those coefficients are individually estimable. The procedure works by first projecting the matrices of covariates and response vectors along directions that are complementary in sign in a subset of the coordinates, a process which we call 'complementary sketching'. The resulting projected covariates and responses are aggregated to form two test statistics, which are shown to have essentially optimal asymptotic power under a Gaussian design when the difference between the two regression coefficients is sparse and dense respectively. Simulations confirm that our methods perform well in a broad class of settings and an application to a large single-cell RNA sequencing dataset demonstrates its utility in the real world.
翻译:我们引入了一种新的方法,用于对高维线性回归系数进行二类测试,而不必假设这些系数是个人可估量的。程序是首先按照坐标的一个子组的标志,按相辅相成的方向对共变矢量和反应矢量矩阵进行预测,我们称之为“补充素描”过程。 由此得出的预测共变数和答复将形成两种测试统计数据,在两种回归系数之间的差别分别是稀少和密集时,在高斯设计下显示其基本上具有最佳的无药效。 模拟证实我们的方法在广泛的环境类别中运行良好,对大型单细胞RNA测序数据集的应用显示了其在现实世界中的实用性。