To improve the precision of inferences and reduce costs there is considerable interest in combining data from several sources such as sample surveys and administrative data. Appropriate methodology is required to ensure satisfactory inferences since the target populations and methods for acquiring data may be quite different. To provide improved inferences we use methodology that has a more general structure than the ones in current practice. We start with the case where the analyst has only summary statistics from each of the sources. In our primary method, uncertain pooling, it is assumed that the analyst can regard one source, survey $r$, as the single best choice for inference. This method starts with the data from survey $r$ and adds data from those other sources that are shown to form clusters that include survey $r$. We also consider Dirichlet process mixtures, one of the most popular nonparametric Bayesian methods. We use analytical expressions and the results from numerical studies to show properties of the methodology.
翻译:为提高推论的精确性和降低成本,人们非常希望将抽样调查和行政数据等若干来源的数据合并起来。需要适当的方法来确保令人满意的推论,因为目标人口和获取数据的方法可能大不相同。为了提供更好的推论,我们使用比目前做法中更为一般的结构方法。我们首先从分析员只从每个来源得到简要统计数据的情况开始。在我们的主要方法中,不确定的集合,假设分析员可以将一个来源,即调查美元,作为唯一最佳的推论选择。这种方法以调查$开始,并增加显示构成包括调查$在内的组别的其他来源的数据。我们还考虑到Drichlet工艺混合物,这是最受欢迎的非参数性非参数贝ysian方法之一。我们使用分析表达方式和数字研究的结果来显示方法的特性。