Causal discovery between collections of time-series data can help diagnose causes of symptoms and hopefully prevent faults before they occur. However, reliable causal discovery can be very challenging, especially when the data acquisition rate varies (i.e., non-uniform data sampling), or in the presence of missing data points (e.g., sparse data sampling). To address these issues, we proposed a new system comprised of two parts, the first part fills missing data with a Gaussian Process Regression, and the second part leverages an Echo State Network, which is a type of reservoir computer (i.e., used for chaotic system modeling) for Causal discovery. We evaluate the performance of our proposed system against three other off-the-shelf causal discovery algorithms, namely, structural expectation-maximization, sub-sampled linear auto-regression absolute coefficients, and multivariate Granger Causality with vector auto-regressive using the Tennessee Eastman chemical dataset; we report on their corresponding Matthews Correlation Coefficient(MCC) and Receiver Operating Characteristic curves (ROC) and show that the proposed system outperforms existing algorithms, demonstrating the viability of our approach to discover causal relationships in a complex system with missing entries.
翻译:收集时间序列数据之间的因果发现有助于诊断症状的原因,并希望在出现症状之前防止出错。然而,可靠的因果发现可能非常具有挑战性,特别是当数据采集率各不相同(即非统一数据抽样),或存在缺失的数据点(如数据抽样稀少)时。为了解决这些问题,我们提议了一个由两个部分组成的新系统,第一部分用高斯进程回归来填补缺失的数据,第二部分则利用回声状态网络(即用于混乱系统建模的储油层计算机类型)来利用回声状态网络(即用于混乱系统建模)来发现原因。我们对照其他三种现成的因果发现算法(即结构性预期-最接近性、次抽样线性自动回归绝对系数、多变式Granger Causality 与矢量自动回归,使用田纳西东方化学数据集;我们报告相应的马修·科连通性(MCC)和收信者操作性直线曲线(ROC)的性能分析,并显示我们现有的因果变法关系,显示我们现有的系统变异性变法。