This paper describes a new library for learning Bayesian networks from data containing discrete and continuous variables (mixed data). In addition to the classical learning methods on discretized data, this library proposes its algorithm that allows structural learning and parameters learning from mixed data without discretization since data discretization leads to information loss. This algorithm based on mixed MI score function for structural learning, and also linear regression and Gaussian distribution approximation for parameters learning. The library also offers two algorithms for enumerating graph structures - the greedy Hill-Climbing algorithm and the evolutionary algorithm. Thus the key capabilities of the proposed library are as follows: (1) structural and parameters learning of a Bayesian network on discretized data, (2) structural and parameters learning of a Bayesian network on mixed data using the MI mixed score function and Gaussian approximation, (3) launching learning algorithms on one of two algorithms for enumerating graph structures - Hill-Climbing and the evolutionary algorithm. Since the need for mixed data representation comes from practical necessity, the advantages of our implementations are evaluated in the context of solving approximation and gap recovery problems on synthetic data and real datasets.
翻译:本文介绍了从包含离散和连续变量(混合数据)的数据中学习贝叶西亚网络的新图书馆。除了关于离散和连续变量(混合数据)的古典学习方法外,本图书馆还提出其算法,允许从混合数据中进行结构性学习和参数学习,而无需离散,因为数据离散会导致信息损失。基于结构学习混合MI评分函数的算法,以及用于参数学习的线性回归和高斯分布近似值。图书馆还提供两种算法,用于计算图表结构----贪婪的希尔爬算法和进化算法。因此,拟议的图书馆的关键能力如下:(1) 贝叶西亚网络关于离散数据的结构性和参数学习,(2) 利用混合评分函数和高斯近似值对巴伊西亚网络的混合数据进行结构性和参数学习,(3) 启动两种算法中的一种算法的学习算法,即希尔山和进化算法。由于实际需要混合数据表达,因此,在解决合成数据和真实数据集的近似和差距恢复问题的背景下,正在评估我们实施工作的优势。