The construction of generalizable and transferable models is a fundamental goal of statistical learning. Learning with the multi-source data helps improve model generalizability and is integral to many important statistical problems, including group distributionally robust optimization, minimax group fairness, and maximin projection. This paper considers multiple high-dimensional regression models for the multi-source data. We introduce the covariate shift maximin effect as a group distributionally robust model. This robust model helps transfer the information from the multi-source data to the unlabelled target population. Statistical inference for the covariate shift maximin effect is challenging since its point estimator may have a non-standard limiting distribution. We devise a novel {\it DenseNet} sampling method to construct valid confidence intervals for the high-dimensional maximin effect. We show that our proposed confidence interval achieves the desired coverage level and attains a parametric length. Our proposed DenseNet sampling method and the related theoretical analysis are of independent interest in addressing other non-regular or non-standard inference problems. We demonstrate the proposed method over a large-scale simulation and genetic data on yeast colony growth under multiple environments.
翻译:构建通用和可转让模型是统计学习的基本目标。使用多源数据学习有助于改进模型的通用性,并且是许多重要统计问题的组成部分,包括群体分布稳健优化、小成群公平性和最大投影。本文考虑了多源数据的多重高维回归模型。我们作为群体分布稳健模型引入了共变最大变换效应。这个强型模型有助于将多源数据的信息转移给无标签目标人口。对于共变最大值效应的统计推论具有挑战性,因为其点估测器可能具有非标准的限制分布。我们设计了一种新颖的“丁斯网”取样方法,以构建高维效应的有效信任间隔。我们表明我们拟议的信任期达到了理想的覆盖水平并达到了一个准长。我们提议的“登斯网”取样方法和相关的理论分析对于解决其他非常规或非标准推论问题具有独立的兴趣。我们展示了针对多个环境中的大规模模拟和顶峰群增长遗传数据的拟议方法。