We commonly assume that data are a homogeneous set of observations when learning the structure of Bayesian networks. However, they often comprise different data sets that are related but not homogeneous because they have been collected in different ways or from different populations. In our previous work (Azzimonti, Corani and Scutari, 2021), we proposed a closed-form Bayesian Hierarchical Dirichlet score for discrete data that pools information across related data sets to learn a single encompassing network structure, while taking into account the differences in their probabilistic structures. In this paper, we provide an analogous solution for learning a Bayesian network from continuous data using mixed-effects models to pool information across the related data sets. We study its structural, parametric, predictive and classification accuracy and we show that it outperforms both conditional Gaussian Bayesian networks (that do not perform any pooling) and classical Gaussian Bayesian networks (that disregard the heterogeneous nature of the data). The improvement is marked for low sample sizes and for unbalanced data sets.
翻译:我们通常认为,数据在学习拜伊西亚网络结构时是一套同质的观测数据,但往往由不同的数据集组成,这些数据集相互关联,但并非同质,因为它们是以不同方式或不同人口收集的。在我们以前的工作(Azzimonti、Corani和Scutari, 2021年)中,我们提议对将不同数据集的信息汇集在一起的离散数据进行封闭形式的巴伊西亚分级分数,以学习单一的涵盖性网络结构,同时考虑到其概率结构的差异。在本文中,我们提供了一个类似的解决方案,从使用混合效应模型的连续数据中学习贝伊斯网络,将信息汇集到相关数据集中。我们研究了其结构、参数、参数、预测和分类准确性,并表明它优于有条件的高斯巴伊西亚网络(不进行任何汇集)和古典高斯湾网络(不考虑数据的不同性质 ) 。改进的标志是低样本大小和不平衡的数据集。