Bayesian networks are a powerful framework for studying the dependency structure of variables in a complex system. The problem of learning Bayesian networks is tightly associated with the given data type. Ordinal data, such as stages of cancer, rating scale survey questions, and letter grades for exams, are ubiquitous in applied research. However, existing solutions are mainly for continuous and nominal data. In this work, we propose an iterative score-and-search method - called the Ordinal Structural EM (OSEM) algorithm - for learning Bayesian networks from ordinal data. Unlike traditional approaches designed for nominal data, we explicitly respect the ordering amongst the categories. More precisely, we assume that the ordinal variables originate from marginally discretizing a set of Gaussian variables, whose structural dependence in the latent space follows a directed acyclic graph. Then, we adopt the Structural EM algorithm and derive closed-form scoring functions for efficient graph searching. Through simulation studies, we illustrate the superior performance of the OSEM algorithm compared to the alternatives and analyze various factors that may influence the learning accuracy. Finally, we demonstrate the practicality of our method with a real-world application on psychological survey data from 408 patients with co-morbid symptoms of obsessive-compulsive disorder and depression.
翻译:Bayesian 网络是研究复杂系统中变量依赖性结构的强大框架。 学习 Bayesian 网络的问题与给定的数据类型密切相关。 Ordinal 数据,如癌症阶段、 评级调查问题和考试字母等级等,在应用研究中无处不在。 但是, 现有的解决方案主要是连续和名义数据。 在这项工作中, 我们提出一种迭代的评分和搜索方法, 叫做 Ordinal 结构 EM (OSEM) 算法, 用于从 ordinal 数据 中学习 Bayesian 网络。 不同于为名义数据设计的传统方法, 我们明确尊重类别之间的排序。 更准确地说, 我们假设, 星系变量源自于一组高斯变量的略离散状态, 这些变量在潜在空间的结构依赖以定向周期图为依托。 然后, 我们采用结构EM 算法, 为高效的图形搜索而推出封闭式的评分功能。 我们通过模拟研究, 展示了OEM 值算法相对于替代方法的优异性表现, 并分析可能影响学习准确性的因素。 最后, 我们展示了我们的方法与40世纪抑郁症患者对心理调查的不常态数据应用法的常态症状的实用性症状的实用性。