Suppose we have individual data from an internal study and various summary statistics from relevant external studies. External summary statistics have the potential to improve statistical inference for the internal population; however, it may lead to efficiency loss or bias if not used properly. We study the fusion of individual data and summary statistics in a semiparametric framework to investigate the efficient use of external summary statistics. Under a weak transportability assumption, we establish the semiparametric efficiency bound for estimating a general functional of the internal data distribution, which is no larger than that using only internal data and underpins the potential efficiency gain of integrating individual data and summary statistics. We propose a data-fused efficient estimator that achieves this efficiency bound. In addition, an adaptive fusion estimator is proposed to eliminate the bias of the original data-fused estimator when the transportability assumption fails. We establish the asymptotic oracle property of the adaptive fusion estimator. Simulations and application to a Helicobacter pylori infection dataset demonstrate the promising numerical performance of the proposed method.
翻译:暂无翻译