In modern data analysis, information is frequently collected from multiple sources, often leading to challenges such as data heterogeneity and imbalanced sample sizes across datasets. Robust and efficient data integration methods are crucial for improving the generalization and transportability of statistical findings. In this work, we address scenarios where, in addition to having full access to individualized data from a primary source, supplementary covariate information from external sources is also available. While traditional data integration methods typically require individualized covariates from external sources, such requirements can be impractical due to limitations related to accessibility, privacy, storage, and cost. Instead, we propose novel data integration techniques that rely solely on external summary statistics, such as sample means and covariances, to construct robust estimators for the mean outcome under both homogeneous and heterogeneous data settings. Additionally, we extend this framework to causal inference, enabling the estimation of average treatment effects for both generalizability and transportability.
翻译:暂无翻译