Information from various data sources is increasingly available nowadays. However, some of the data sources may produce biased estimation due to commonly encountered biased sampling, population heterogeneity, or model misspecification. This calls for statistical methods to combine information in the presence of biased sources. In this paper, a robust data fusion-extraction method is proposed. The method can produce a consistent estimator of the parameter of interest even if many of the data sources are biased. The proposed estimator is easy to compute and only employs summary statistics, and hence can be applied to many different fields, e.g. meta-analysis, Mendelian randomisation and distributed system. Moreover, the proposed estimator is asymptotically equivalent to the oracle estimator that only uses data from unbiased sources under some mild conditions. Asymptotic normality of the proposed estimator is also established. In contrast to the existing meta-analysis methods, the theoretical properties are guaranteed even if both the number of data sources and the dimension of the parameter diverge as the sample size increases, which ensures the performance of the proposed method over a wide range. The robustness and oracle property is also evaluated via simulation studies. The proposed method is applied to a meta-analysis data set to evaluate the surgical treatment for the moderate periodontal disease, and a Mendelian randomization data set to study the risk factors of head and neck cancer.
翻译:然而,一些数据源可能由于常见的偏差抽样、人口差异性或模型区分系统等常见的偏差抽样、人口差异性或模型偏差而产生偏差估计。这就要求采用统计方法,在有偏差来源的情况下将信息合并在一起。在本文中,提议了一种强有力的数据聚合延伸方法。这种方法可以得出一个一致的、对关注参数的估算,即使许多数据源存在偏差。提议的估算器很容易计算,而且只使用摘要统计,因此,可能产生偏差的估计结果,从而可以适用于许多不同的领域,例如元分析、门德利安随机化和分布系统。此外,提议的估算器与仅在一些温和条件下使用公正来源数据的甲骨文估计方法完全相同。提议的估算器的常态常态性常态性与现有的元分析方法不同,即使数据源的数量和参数因抽样大小增加而不同,从而保证了中度癌症风险的增加,从而保证了该方法的正常性,而且通过模拟分析法对结果进行了广泛评估。对拟议方法进行了稳妥性研究,对结果进行了评估,对结果进行了评估,对结果进行了广泛分析。对结果进行了评估。对结果进行了评估,对结果进行了评估。对结果作了评估,对结果进行了评估。