Bayesian networks have been used as a mechanism to represent the joint distribution of multiple random variables in a flexible yet interpretable manner. One major challenge in learning the structure of a network is how to model networks which include a mixture of continuous and discrete random variables, known as hybrid Bayesian networks. This paper reviews the literature on approaches to handle hybrid Bayesian networks. When working with hybrid Bayesian networks, typically one of two approaches is taken: either the data are considered to have a joint multivariate Gaussian distribution, irrespective of the true distribution, or continuous random variables are discretized, resulting in discrete Bayesian networks. In this paper, we show that a strategy to model all random variables as Gaussian outperforms the strategy which converts the continuous random variables to discrete. We demonstrate the superior performance of our strategy over the latter, theoretically and by simulation studies for various settings. Both strategies are also implemented on a childhood obesity data set. The two different strategies give rise to significant differences in the optimal graph structures, with the results of the simulation study suggesting that the inference from the strategy assuming all random variables are Gaussian is more reliable.
翻译:Bayesian 网络被用作一种机制,以灵活但可解释的方式代表多个随机变量的联合分布。学习网络结构的一个主要挑战是如何模拟包含连续和离散随机变量(称为Bayesian混合网络)的网络。本文回顾了关于处理Bayesian混合网络的方法的文献。在与Bayesian混合网络合作时,通常采取两种方法之一:要么认为数据具有多变量(无论真实分布如何)的混合分布,要么认为连续随机变量是分散的,从而形成离散的Bayesian网络。在本文中,我们展示了一种将连续随机变量转换为离散的战略模型的所有随机变量的战略。我们展示了我们的战略在后者上、理论上和通过模拟研究在不同环境中的优异性表现。这两种战略还用于建立儿童肥胖症数据集。两种不同的战略在最佳图表结构中产生了显著的差异,而模拟研究的结果表明,假设所有随机变量的战略的推断值是Gaussian,因此更加可靠。