Finding a globally optimal Bayesian Network using exhaustive search is a problem with super-exponential complexity, which severely restricts the number of variables that it can work for. We implement a dynamic programming based algorithm with built-in dimensionality reduction and parent set identification. This reduces the search space drastically and can be applied to large-dimensional data. We use what we call generational orderings based search for optimal networks, which is a novel way to efficiently search the space of possible networks given the possible parent sets. The algorithm supports both continuous and categorical data, and categorical as well as survival outcomes. We demonstrate the efficacy of our algorithm on both synthetic and real data. In simulations, our algorithm performs better than three state-of-art algorithms that are currently used extensively. We then apply it to an Ovarian Cancer gene expression dataset with 513 genes and a survival outcome. Our algorithm is able to find an optimal network describing the disease pathway consisting of 6 genes leading to the outcome node in a few minutes on a basic computer. Our generational orderings based search for optimal networks, is both efficient and highly scalable approach to finding optimal Bayesian Networks, that can be applied to 1000s of variables. Using specifiable parameters - correlation, FDR cutoffs, and in-degree - one can increase or decrease the number of nodes and density of the networks. Availability of two scoring option-BIC and Bge-and implementation of survival outcomes and mixed data types makes our algorithm very suitable for many types of high dimensional biomedical data to find disease pathways.
翻译:利用彻底搜索,寻找全球最佳巴伊西亚网络,这是使用全球最佳的巴伊西亚网络的一个问题。 算法既支持连续和绝对的数据,也支持绝对的求生结果。 我们在合成和真实数据上展示了我们的算法的功效。 在模拟中,我们的算法比目前广泛使用的三种最先进的算法表现得更好。 然后我们将其应用到一个有513个基因和生存结果的奥伐利亚癌症基因表达数据集。 我们的算法能够找到一个最佳的网络,描述由6个基因组成的疾病路径,导致在几分钟内找到基本计算机的结果节点。 我们的代数排序既支持连续和绝对的数据,也支持绝对的求生结果。 我们在合成和真实数据上都展示了我们的算法效率。 在模拟中,我们的算法比目前广泛使用的三种最先进的算法算法计算法的算法要好得多。 我们的算法可以找到最佳的网络的高效和高度可伸缩性的方法, 用来找到最佳BA型和最可变的直径的直径的网络。