We introduce semiparametric Bayesian networks that combine parametric and nonparametric conditional probability distributions. Their aim is to incorporate the advantages of both components: the bounded complexity of parametric models and the flexibility of nonparametric ones. We demonstrate that semiparametric Bayesian networks generalize two well-known types of Bayesian networks: Gaussian Bayesian networks and kernel density estimation Bayesian networks. For this purpose, we consider two different conditional probability distributions required in a semiparametric Bayesian network. In addition, we present modifications of two well-known algorithms (greedy hill-climbing and PC) to learn the structure of a semiparametric Bayesian network from data. To realize this, we employ a score function based on cross-validation. In addition, using a validation dataset, we apply an early-stopping criterion to avoid overfitting. To evaluate the applicability of the proposed algorithm, we conduct an exhaustive experiment on synthetic data sampled by mixing linear and nonlinear functions, multivariate normal data sampled from Gaussian Bayesian networks, real data from the UCI repository, and bearings degradation data. As a result of this experiment, we conclude that the proposed algorithm accurately learns the combination of parametric and nonparametric components, while achieving a performance comparable with those provided by state-of-the-art methods.
翻译:我们引入了将参数和不参数的有条件概率分布结合起来的半对称巴伊萨网络,目的是将两个组成部分的优点结合起来:参数模型的界限复杂性和非参数模型的灵活性。我们证明半对称巴伊萨网络概括了两种广为人知的巴伊西亚网络类型:高西亚巴伊西亚网络和内核密度估计巴伊西亚网络。为此,我们考虑在半对称巴伊西亚网络中需要两种不同的有条件概率分布。此外,我们介绍了两个众所周知的算法(Greedy山坡和PC)的修改,以便从数据中学习半对称巴伊萨网络的结构。为了实现这一点,我们使用了基于交叉校验的分功能。此外,我们使用一个早期停止标准来避免过度校准。为了评估拟议的算法的可适用性,我们通过混合线性和非线性功能、从戈伊亚巴伊萨网络中抽样的多变量正常数据样本和PC)来进行修改。为了实现这一点,我们使用了一个基于精确的州性能分析结果,我们从实验室数据库中得出了一种不具有可比性的数据。