Very often for the same scientific question, there may exist different techniques or experiments that measure the same numerical quantity. Historically, various methods have been developed to exploit the information within each type of data independently. However, statistical data fusion methods that could effectively integrate multi-source data under a unified framework are lacking. In this paper, we propose a novel data fusion method, called B-scaling, for integrating multi-source data. Consider $K$ measurements that are generated from different sources but measure the same latent variable through some linear or nonlinear ways. We seek to find a representation of the latent variable, named B-mean, which captures the common information contained in the $K$ measurements while takes into account the nonlinear mappings between them and the latent variable. We also establish the asymptotic property of the B-mean and apply the proposed method to integrate multiple histone modifications and DNA methylation levels for characterizing epigenomic landscape. Both numerical and empirical studies show that B-scaling is a powerful data fusion method with broad applications.
翻译:对于同一科学问题,往往存在不同的技术或实验,衡量相同数量。历史上,已经开发了不同的方法,独立利用每一类数据中的信息。然而,缺乏在统一框架内有效地整合多源数据的统计数据聚合方法。在本文中,我们提议采用新的数据聚合方法,称为B级缩放法,用于整合多源数据。考虑从不同来源产生的量测,但通过某种线性或非线性方法测量相同的潜在变量。我们寻求找到潜在变量的表示,即B-平均值,它捕捉了美元测量中所包含的共同信息,同时考虑到了美元测量与潜在变量之间的非线性绘图。我们还确定了B级的无线性属性,并应用了拟议方法将多根骨质修改和DNA甲基化等级结合起来,以描述上皮层景观特征。两个数字和实证研究都表明,B级测量是一种强大的数据聚合方法,其应用范围很广。