Motivated by the widely used geometric median-of-means estimator in machine learning, this paper studies statistical inference for ultrahigh dimensionality location parameter based on the sample spatial median under a general multivariate model, including simultaneous confidence intervals construction, global tests, and multiple testing with false discovery rate control. To achieve these goals, we derive a novel Bahadur representation of the sample spatial median with a maximum-norm bound on the remainder term, and establish Gaussian approximation for the sample spatial median over the class of hyperrectangles. In addition, a multiplier bootstrap algorithm is proposed to approximate the distribution of the sample spatial median. The approximations are valid when the dimension diverges at an exponentially rate of the sample size, which facilitates the application of the spatial median in the ultrahigh dimensional region. The proposed approaches are further illustrated by simulations and analysis of a genomic dataset from a microarray study.
翻译:本文以机器学习中广泛使用的几何中位平均值估测器为动力,根据一般多变模型下的样本空间中位数研究超高维位置参数的统计推论,包括同时的置信间隔结构、全球测试和以虚假的发现率控制进行多重测试。为了实现这些目标,我们从样本中位数中得出一个新的巴哈杜尔语代表法,在剩余期内以最大向量为约束,并为超矩形类的样本空间中位数建立高斯亚近似法。此外,还提议采用倍增靴陷阱算法,以估计样本空间中位的分布。当尺寸以样本大小的指数速度变化时,近似值是有效的,这有助于在超高维区域应用空间中位。通过模拟和分析微小研究的基因数据集,可以进一步说明拟议的方法。