可区别的 DAG 学习的剪剪矩阵电源迭代 (Truncated Matrix Power Iteration for Differentiable DAG Learning)

Recovering underlying Directed Acyclic Graph structures (DAG) from observational data is highly challenging due to the combinatorial nature of the DAG-constrained optimization problem. Recently, DAG learning has been cast as a continuous optimization problem by characterizing the DAG constraint as a smooth equality one, generally based on polynomials over adjacency matrices. Existing methods place very small coefficients on high-order polynomial terms for stabilization, since they argue that large coefficients on the higher-order terms are harmful due to numeric exploding. On the contrary, we discover that large coefficients on higher-order terms are beneficial for DAG learning, when the spectral radiuses of the adjacency matrices are small, and that larger coefficients for higher-order terms can approximate the DAG constraints much better than the small counterparts. Based on this, we propose a novel DAG learning method with efficient truncated matrix power iteration to approximate geometric series-based DAG constraints. Empirically, our DAG learning method outperforms the previous state-of-the-arts in various settings, often by a factor of 3 or more in terms of structural Hamming distance.

翻译：由于DAG受限制的优化问题的组合性质,从观测数据中回收直接环形图结构(DAG)是极具挑战性的。最近,DAG学习被定性为连续优化问题,因为将DAG限制定性为光滑的平等问题,一般基于对相邻基体的多元基体。现有方法将非常小的系数放在高阶多式多式稳定条件上,因为它们认为,高阶条件的较大系数对数字爆炸有害。相反,我们发现,高阶条件的较大系数有利于DAG学习,当相邻基体的光谱半径很小时,DAG学习作为一个连续优化问题,而较大等级条件的系数可以比小对等基体更接近DAG限制。基于这一点,我们建议一种新型DAG学习方法,具有高效的松动矩阵功率,用于估计基于几何序列的DAG限制。我们DAG学习方法往往在各种环境中比先前的状态,或更远距离结构上,常常用一个要素来比Ham3。