Score functions for learning the structure of Bayesian networks in the literature assume that data are a homogeneous set of observations; whereas it is often the case that they comprise different related, but not homogeneous, data sets collected in different ways. In this paper we propose a new Bayesian Dirichlet score, which we call Bayesian Hierarchical Dirichlet (BHD). The proposed score is based on a hierarchical model that pools information across data sets to learn a single encompassing network structure, while taking into account the differences in their probabilistic structures. We derive a closed-form expression for BHD using a variational approximation of the marginal likelihood, we study the associated computational cost and we evaluate its performance using simulated data. We find that, when data comprise multiple related data sets, BHD outperforms the Bayesian Dirichlet equivalent uniform (BDeu) score in terms of reconstruction accuracy as measured by the Structural Hamming distance, and that it is as accurate as BDeu when data are homogeneous. This improvement is particularly clear when either the number of variables in the network or the number of observations is large. Moreover, the estimated networks are sparser and therefore more interpretable than those obtained with BDeu thanks to a lower number of false positive arcs.
翻译:在文献中,学习巴伊西亚网络结构的评分功能假定数据是一组同质的观测数据;而数据往往是由不同相关但并非同质的数据集组成的。在本文中,我们提议一个新的巴伊西亚分数,我们称之为巴伊西亚最高分。提议的评分基于一个等级模型,将各数据集的信息汇集在一起,以学习一个单一的涵盖性网络结构,同时考虑到其概率结构的差异。我们使用边际可能性的变相近似值为 BHD 生成一个封闭式表达式表达式,我们研究相关的计算成本,并利用模拟数据评估其性能。我们发现,当数据包含多个相关数据集时,BHD优于Beesian Dirichlet等同的制服(BDeu),根据结构Hamming距离测量的重建准确性评分与BDeu相同,而当数据均匀时,这种改进特别清楚,当网络变量的数目或观测次数大时,我们研究相关的计算成本,我们用模拟数据来评估其性。我们发现,当数据包含多个相关数据集时,BHD比Bdeletr的估算值要低一些。因此,估计网络的准确值比Brcrecu值更低。