Bayesian Networks have been widely used in the last decades in many fields, to describe statistical dependencies among random variables. In general, learning the structure of such models is a problem with considerable theoretical interest that poses many challenges. On the one hand, it is a well-known NP-complete problem, practically hardened by the huge search space of possible solutions. On the other hand, the phenomenon of I-equivalence, i.e., different graphical structures underpinning the same set of statistical dependencies, may lead to multimodal fitness landscapes further hindering maximum likelihood approaches to solve the task. In particular, we exploit the NSGA-II multi-objective optimization procedure in order to explicitly account for both the likelihood of a solution and the number of selected arcs, by setting these as the two objective functions of the method. The aim of this work is to investigate the behavior of NSGA-II and analyse the quality of its solutions. We thus thoroughly examined the optimization results obtained on a wide set of simulated data, by considering both the goodness of the inferred solutions in terms of the objective functions values achieved, and by comparing the retrieved structures with the ground truth, i.e., the networks used to generate the target data. Our results show that NSGA-II can converge to solutions characterized by better likelihood and less arcs than classic approaches, although paradoxically characterized in many cases by a lower similarity with the target network.
翻译:在过去几十年中,在许多领域广泛使用贝叶斯网络,以描述随机变量之间的统计依赖性。一般而言,学习这类模型的结构是一个具有相当的理论兴趣的问题,它带来了许多挑战。一方面,这是一个众所周知的NP完整的问题,几乎因寻找大量可能的解决办法而更加强硬。另一方面,I-equality现象,即同一一组统计依赖性所依赖的不同图形结构,可能导致多式健身环境进一步妨碍以尽可能大的可能性方法来完成任务。我们利用NSGA-II多目标优化程序,明确说明解决办法的可能性和选定弧的数目,将这些问题确定为方法的两个客观功能。这项工作的目的是调查国家统计GA-II的行为,分析其解决办法的质量。因此,我们通过考虑在所实现的客观功能价值方面所推断的解决办法的优点,并通过将回收的系统结构与我们所使用的目标一致程度较低的数据相比,通过标准化的网络比常规数据更精确地显示一个典型的真相。