We aim to make inferences about a smooth, finite-dimensional parameter by fusing data from multiple sources together. Previous works have studied the estimation of a variety of parameters in similar data fusion settings, including in the estimation of the average treatment effect and average reward under a policy, with the majority of them merging one historical data source with covariates, actions, and rewards and one data source of the same covariates. In this work, we consider the general case where one or more data sources align with each part of the distribution of the target population, for example, the conditional distribution of the reward given actions and covariates. We describe potential gains in efficiency that can arise from fusing these data sources together in a single analysis, which we characterize by a reduction in the semiparametric efficiency bound. We also provide a general means to construct estimators that achieve these bounds. In numerical experiments, we illustrate marked improvements in efficiency from using our proposed estimators rather than their natural alternatives. Finally, we illustrate the magnitude of efficiency gains that can be realized in vaccine immunogenicity studies by fusing data from two HIV vaccine trials.
翻译:我们的目标是通过将多种来源的数据混合在一起,对一个光滑的、有限的维度参数作出推断。以前的工作研究过类似数据融合环境中的各种参数的估计,包括估计平均治疗效果和根据一项政策的平均奖励,其中多数是将一个历史数据源与共变数据、行动、奖赏和同一共变数据源合并在一起。在这项工作中,我们考虑了一个或一个以上数据源与目标人口分布的每个部分相一致的一般案例,例如有条件地分配给定的奖励和共变项。我们用单一分析来说明将这些数据源合并在一起可能带来的效率提高,我们用半对称效率捆绑来说明这一点的特征。我们还提供了一种一般手段,用以构建达到这些界限的估算数据。在数字实验中,我们说明了使用我们提议的估量器而不是自然替代物的效率显著提高。最后,我们用两项艾滋病毒疫苗试验的数据来说明在疫苗免疫率研究中可以实现的效率提高的程度。