Simulated DAG models may exhibit properties that, perhaps inadvertently, render their structure identifiable and unexpectedly affect structure learning algorithms. Here, we show that marginal variance tends to increase along the causal order for generically sampled additive noise models. We introduce varsortability as a measure of the agreement between the order of increasing marginal variance and the causal order. For commonly sampled graphs and model parameters, we show that the remarkable performance of some continuous structure learning algorithms can be explained by high varsortability and matched by a simple baseline method. Yet, this performance may not transfer to real-world data where varsortability may be moderate or dependent on the choice of measurement scales. On standardized data, the same algorithms fail to identify the ground-truth DAG or its Markov equivalence class. While standardization removes the pattern in marginal variance, we show that data generating processes that incur high varsortability also leave a distinct covariance pattern that may be exploited even after standardization. Our findings challenge the significance of generic benchmarks with independently drawn parameters. The code is available at https://github.com/Scriddie/Varsortability.
翻译:模拟 DAG 模型可能无意中呈现出一些特性,这些特性可能使其结构可以识别,并意外地影响结构学习算法。 这里, 我们显示, 边际差异往往会随着一般抽样添加的噪音模型的因果顺序而增加。 我们引入变异性作为边际差异增加顺序和因果顺序之间协议的一种衡量尺度。 对于通常抽样的图表和模型参数, 我们显示, 某些连续结构学习算法的显著性能可以由高变异性来解释, 并配以简单的基线方法。 然而, 这种性能可能不会转移给真实世界数据, 因为变异性可能是中性的, 或取决于测量尺度的选择。 在标准化数据上, 同样的算法无法确定地- TRuth DAG 或其 Markov 等值等级。 虽然标准化可以消除边际差异模式, 但我们显示, 产生高变异性数据的过程还留有明显的变异性模式, 即使在标准化之后也可以加以利用。 我们的发现, 通用基准的意义与独立绘制的参数有关。 在 https:// github. com/ Sriddddadidie/Vartortal 参数上可以使用。