Causal discovery between collections of time-series data can help diagnose causes of symptoms and hopefully prevent faults before they occur. However, reliable causal discovery can be very challenging, especially when the data acquisition rate varies (i.e., non-uniform data sampling), or in the presence of missing data points (e.g., sparse data sampling). To address these issues, we proposed a new system comprised of two parts, the first part fills missing data with a Gaussian Process Regression, and the second part leverages an Echo State Network, which is a type of reservoir computer (i.e., used for chaotic system modelling) for Causal discovery. We evaluate the performance of our proposed system against three other off-the-shelf causal discovery algorithms, namely, structural expectation-maximization, sub-sampled linear auto-regression absolute coefficients, and multivariate Granger Causality with vector auto-regressive using the Tennessee Eastman chemical dataset; we report on their corresponding Matthews Correlation Coefficient(MCC) and Receiver Operating Characteristic curves (ROC) and show that the proposed system outperforms existing algorithms, demonstrating the viability of our approach to discover causal relationships in a complex system with missing entries.
翻译:时间序列数据收集之间的因果发现有助于诊断症状的原因,并有望在出现症状之前防止出错。然而,可靠的因果发现可能非常具有挑战性,特别是当数据采集率各不相同(即非统一数据抽样),或存在缺失的数据点(例如数据抽样稀少)时。为了解决这些问题,我们提议了一个由两个部分组成的新系统,第一部分用高斯进程回归来填补缺失的数据,第二部分则利用回声状态网络(即用于混乱系统建模的储油层计算机类型)来进行Causal发现。我们对照另外三个现成的因果发现算法(即结构性预期-最大化、次级抽样线性自动回归绝对系数、多变式Garanger Causality和矢量自动回归,使用田纳西东方化学数据集;我们报告它们对应的“马修·Correlationality Co(MCC)”和收信者操作性直径分析曲线(ROC)的性能分析性能,并显示我们现有的系统变异性分析法中的拟议系统变异性关系。