We study the optimal sample complexity of learning a Gaussian directed acyclic graph (DAG) from observational data. Our main result establishes the minimax optimal sample complexity for learning the structure of a linear Gaussian DAG model with equal variances to be $n\asymp q\log(d/q)$, where $q$ is the maximum number of parents and $d$ is the number of nodes. We further make comparisons with the classical problem of learning (undirected) Gaussian graphical models, showing that under the equal variance assumption, these two problems share the same optimal sample complexity. In other words, at least for Gaussian models with equal error variances, learning a directed graphical model is not more difficult than learning an undirected graphical model. Our results also extend to more general identification assumptions as well as subgaussian errors.
翻译:我们研究从观测数据中学习高斯引导的单周期图(DAG)的最佳样本复杂性。 我们的主要结果为学习线性高斯方块模型的结构确定了最小型的最佳样本复杂性,其差异相等,为$n\asymp q\log(d/q)$,其中Q美元是父母的最大数目,$d(d/q)美元是节点的数量。我们进一步比较了典型的学习(非定向)高斯方块图问题,表明在相同的差异假设下,这两个问题具有相同的最佳样本复杂性。换句话说,至少对于有相同误差的高斯方块模型来说,学习定向图形模型并不比学习一个非定向图形模型更困难。我们的结果还扩大到更一般的识别假设以及子高加索错误。