Score functions for learning the structure of Bayesian networks in the literature assume that data are a homogeneous set of observations; whereas it is often the case that they comprise different related, but not homogeneous, data sets collected in different ways. In this paper we propose a new Bayesian Dirichlet score, which we call Bayesian Hierarchical Dirichlet (BHD). The proposed score is based on a hierarchical model that pools information across data sets to learn a single encompassing network structure, while taking into account the differences in their probabilistic structures. We derive a closed-form expression for BHD using a variational approximation of the marginal likelihood and we study its performance using simulated data. We find that, when data comprise multiple related data sets, BHD outperforms the Bayesian Dirichlet equivalent uniform (BDeu) score in terms of reconstruction accuracy as measured by the Structural Hamming distance, and that it is as accurate as BDeu when data are homogeneous. Moreover, the estimated networks are sparser and therefore more interpretable than those obtained with BDeu, thanks to a lower number of false positive arcs.
翻译:文献中用于学习巴伊西亚网络结构的评分功能假定数据是一组同质的观测数据; 而通常的情况是,数据是由不同相关但并非同质的数据集以不同方式收集的。 在本文中,我们提出一个新的巴伊西亚迪里赫莱得分,我们称之为巴伊西亚高级分。 提议的评分基于一个等级模型,将各数据集的信息汇集在一起,以学习一个单一的覆盖性网络结构,同时考虑到其概率结构的差异。 我们使用边际可能性的变相近似值为 BHD 生成一种封闭式表达方式,我们使用模拟数据研究其性能。 我们发现,当数据包含多个相关数据集时,BHD优于Bayesian Dirichlet等同的制服(BDeu)得分,以结构哈明距离测量的重建精度为标准,而且数据均匀时与BDeu一样准确。 此外,估计的网络比BDeu所获得的网络少,因此比与BDeu获得的数据更易解。