A Bayesian Network is a directed acyclic graph (DAG) on a set of $n$ random variables (identified with the vertices); a Bayesian Network Distribution (BND) is a probability distribution on the rv's that is Markovian on the graph. A finite mixture of such models is the projection on these variables of a BND on the larger graph which has an additional "hidden" (or "latent") random variable $U$, ranging in $\{1,\ldots,k\}$, and a directed edge from $U$ to every other vertex. Models of this type are fundamental to research in Causal Inference, where $U$ models a confounding effect. One extremely special case has been of longstanding interest in the theory literature: the empty graph. Such a distribution is simply a mixture of $k$ product distributions. A longstanding problem has been, given the joint distribution of a mixture of $k$ product distributions, to identify each of the product distributions, and their mixture weights. Our results are: (1) We improve the sample complexity (and runtime) for identifying mixtures of $k$ product distributions from $\exp(O(k^2))$ to $\exp(O(k \log k))$. This is almost best possible in view of a known $\exp(\Omega(k))$ lower bound. (2) We give the first algorithm for the case of non-empty graphs. The complexity for a graph of maximum degree $\Delta$ is $\exp(O(k(\Delta^2 + \log k)))$. (The above complexities are approximate and suppress dependence on secondary parameters.)
翻译:Bayesian Network 是一组美元随机变量的定向环球图(DAG), 以美元为单位( 与顶点确定 ) ; Bayesian 网络分布(BND) 是图中 Markovian 的 rv 的概率分布。 这种模型的有限混合物是大图中 BND 这些变量的预测, 该图中含有额外的“ 隐藏”( 或“ 相对” ), 随机变量美元, 以$1,\ldots, 和美元至其他每个顶点的直线值。 这种类型的模型对于 Causal 推断( 以美元为单位) 研究来说至关重要。 在理论文献中长期感兴趣的一个非常特殊的例子: 空图。 这种分布仅仅是美元产品分布的混合物。 长期存在一个问题, 以美元产品分布的混合物( 美元为单位( 美元) 的混合物, 确定每个产品发行量, 和其混合物重量。 我们从 1 美元 的样本( 和 美元 美元) 的发行量 。