Suppose we have available individual data from an internal study and various types of summary statistics from relevant external studies. External summary statistics have been used as constraints on the internal data distribution, which promised to improve the statistical inference in the internal data; however, the additional use of external summary data may lead to paradoxical results: efficiency loss may occur if the uncertainty of summary statistics is not negligible and large estimation bias can emerge even if the bias of external summary statistics is small. We investigate these paradoxical results in a semiparametric framework. We establish the semiparametric efficiency bound for estimating a general functional of the internal data distribution, which is shown to be no larger than that using only internal data. We propose a data-fused efficient estimator that achieves this bound so that the efficiency paradox is resolved. This data-fused estimator is further regularized with adaptive lasso penalty so that the resultant estimator can achieve the same asymptotic distribution as the oracle one that uses only unbiased summary statistics, which resolves the bias paradox. Simulations and application to a Helicobacter pylori infection dataset are used to illustrate the proposed methods.
翻译:外部摘要统计被作为内部数据分配的制约因素,保证改善内部数据的统计推论;然而,额外使用外部摘要数据可能会导致自相矛盾的结果:如果摘要统计的不确定性不是微不足道的,即使外部摘要统计的偏差很小,也可能出现巨大的估计偏差,那么效率就会下降。我们在半参数框架内对这些矛盾的结果进行调查。我们建立了半参数效率,用来估计内部数据分配的一般功能,事实证明,这种效率并不大于仅使用内部数据分布的一般功能。我们提出了一个数据使用的有效估计器,实现这一界限,从而解决效率的矛盾。这个数据使用的估计器进一步规范了适应性拉索惩罚,以便结果的估计器能够实现与仅使用不偏差的简要统计的极片一样的分布,从而解决偏差的矛盾。对Helicabacter pylori感染数据集的模拟和应用被用来说明拟议的方法。