Graphical structures estimated by causal learning algorithms from time series data can provide highly misleading causal information if the causal timescale of the generating process fails to match the measurement timescale of the data. Although this problem has been recently recognized, practitioners have limited resources to respond to it, and so must continue using models that they know are likely misleading. Existing methods either (a) require that the difference between causal and measurement timescales is known; or (b) can handle only very small number of random variables when the timescale difference is unknown; or (c) apply to only pairs of variables, though with fewer assumptions about prior knowledge; or (d) return impractically too many solutions. This paper addresses all four challenges. We combine constraint programming with both theoretical insights into the problem structure and prior information about admissible causal interactions. The resulting system provides a practical approach that scales to significantly larger sets (>100) of random variables, does not require precise knowledge of the timescale difference, supports edge misidentification and parametric connection strengths, and can provide the optimum choice among many possible solutions. The cumulative impact of these improvements is gain of multiple orders of magnitude in speed and informativeness.
翻译:由时间序列数据的因果学习算法估计的图形结构可以提供极有误导性的因果关系信息,如果生成过程的因果时间尺度不能与数据的计量时间尺度相匹配的话。虽然这个问题最近已经得到确认,但实践者应对这一问题的资源有限,因此必须继续使用他们知道可能误导的模型。现有的方法有:(a) 要求了解因果和计量时间尺度之间的差异;或(b) 在时间尺度差异不明的情况下,只能处理极小数量的随机变量;或(c) 仅适用于对变量的对齐,尽管对先前知识的假设较少;或(d) 不切实际地返回太多的解决办法。本文述及所有四项挑战。我们把制约程序与对问题结构的理论洞察和关于可受理因果互动的先前信息结合起来。由此产生的系统提供了一种实用的方法,即从规模到大得多的数组( > 100)随机变量,并不要求准确了解时间尺度差异,支持边缘的辨别和准连接强度,并且可以提供许多可能的解决办法的最佳选择。这些改进的累积影响是速度和了解程度的多个数量级。