We consider the task of meta-analysis in high-dimensional settings in which the data sources are similar but non-identical. To borrow strength across such heterogeneous datasets, we introduce a global parameter that emphasizes interpretability and statistical efficiency in the presence of heterogeneity. We also propose a one-shot estimator of the global parameter that preserves the anonymity of the data sources and converges at a rate that depends on the size of the combined dataset. For high-dimensional linear model settings, we demonstrate the superiority of our identification restrictions in adapting to a previously seen data distribution as well as predicting for a new/unseen data distribution. Finally, we demonstrate the benefits of our approach on a large-scale drug treatment dataset involving several different cancer cell-lines.
翻译:我们考虑在数据源相似但并不相同的高维环境中进行元分析的任务。为了在此类多变数据集中借用强度,我们引入了一个全球参数,强调在差异性存在的情况下可解释性和统计效率。我们还提出了一个全球参数的一次性估计,以保持数据源的匿名性,并以一个取决于合并数据集大小的速度汇合。对于高维线性模型环境,我们显示了我们的识别限制在适应先前看到的数据分布以及预测新的/不见的数据分布方面的优势。最后,我们展示了我们在涉及若干不同癌症细胞的大规模药物治疗数据集方面的做法的好处。