A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to learn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequences of intervention. Three, because the model has both a causal and probabilistic semantics, it is an ideal representation for combining prior knowledge (which often comes in causal form) and data. Four, Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and principled approach for avoiding the overfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models. With regard to the latter task, we describe methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with incomplete data. In addition, we relate Bayesian-network methods for learning to techniques for supervised and unsupervised learning. We illustrate the graphical-modeling approach using a real-world case study.
翻译:Bayesian 网络是一个图形模型,它能对感兴趣的变量之间的概率关系进行编码。当与统计技术结合使用时,图形模型在数据分析方面有若干优点。一,由于模型能将所有变量之间的依赖性进行编码,因此它很容易处理某些数据条目缺失的情况。二,可以使用Bayesian 网络来学习因果关系,从而可以用来了解问题领域,并预测干预的后果。三,由于该模型既具有因果关系,又具有概率的语义学,它是将先前的知识(往往以因果形式出现)和数据相结合的理想代表。四,Bayesian 统计方法与Bayesian 网络相结合,提供了避免数据过于匹配的有效和有原则的方法。在本文件中,我们讨论了从先前的知识中建立Bayesian 网络的方法,并总结了利用数据改进这些模型的Bayesian 统计方法。关于后一项任务,我们描述了学习Bayesian 网络参数和结构的方法,包括用不完整的数据学习的技术。此外,我们把使用Bayesian- 网络 方法与Bayesian 系统化方法联系起来,用来学习监督世界的案例研究。