Gaussian Bayesian networks (a.k.a. linear Gaussian structural equation models) are widely used to model causal interactions among continuous variables. In this work, we study the problem of learning a fixed-structure Gaussian Bayesian network up to a bounded error in total variation distance. We analyze the commonly used node-wise least squares regression (LeastSquares) and prove that it has a near-optimal sample complexity. We also study a couple of new algorithms for the problem: - BatchAvgLeastSquares takes the average of several batches of least squares solutions at each node, so that one can interpolate between the batch size and the number of batches. We show that BatchAvgLeastSquares also has near-optimal sample complexity. - CauchyEst takes the median of solutions to several batches of linear systems at each node. We show that the algorithm specialized to polytrees, CauchyEstTree, has near-optimal sample complexity. Experimentally, we show that for uncontaminated, realizable data, the LeastSquares algorithm performs best, but in the presence of contamination or DAG misspecification, CauchyEst/CauchyEstTree and BatchAvgLeastSquares respectively perform better.
翻译:Gausian Bayesian 网络 (a.k.a. a. 线性 Gaussian 结构方程式模型) 被广泛用来模拟连续变量之间的因果关系。 在这项工作中,我们研究学习固定结构 Gausian Bayesian 网络直到总变异距离的连接错误的问题。 我们分析常用的节点最小正方形回归(LeastSquares), 并证明它具有近乎最佳的样本复杂性。 我们还研究针对问题的几种新算法: - BatchAvLastSquare, 在每个节点中, 以几批最小平方解决方案的平均比例为几批, 这样可以将批量大小与批量数量进行交叉。 我们显示, BatchAvgleastSquare 也具有近于最佳的样本复杂性。 causEvocialEsttst 将解决方案的中中中位数用于每个节点的线性系统。 我们发现, 专用于多树的算, Coucial- deal developal adal cals 和最不具有不具有可变的卡级的卡度数据。