The Granger framework is widely used for discovering causal relationships based on time-varying signals. Implementations of Granger causality (GC) are mostly developed for densely sampled timeseries data. A substantially different setting, particularly common in population health applications, is the longitudinal study design, where multiple individuals are followed and sparsely observed for a limited number of times. Longitudinal studies commonly track many variables, which are likely governed by nonlinear dynamics that might have individual-specific idiosyncrasies and exhibit both direct and indirect causes. Furthermore, real-world longitudinal data often suffer from widespread missingness. GC methods are not well-suited to handle these issues. In this paper, we intend to fill this methodological gap. We propose to marry the GC framework with a machine learning based prediction model. We call our approach GLACIAL, which stands for "Granger and LeArning-based CausalIty Analysis for Longitudinal studies." GLACIAL treats individuals as independent samples and uses average prediction accuracy on hold-out individuals to test for effects of causal relationships. GLACIAL employs a multi-task neural network trained with input feature dropout to efficiently learn nonlinear dynamic relationships between a large number of variables, handle missing values, and probe causal links. Extensive experiments on synthetic and real data demonstrate the utility of GLACIAL and how it can outperform competitive baselines.
翻译:Granger框架被广泛用于根据时间变化信号发现因果关系。Granger框架被广泛用于根据时间变化信号发现因果关系。Granger因果关系(GC)的实施大多是为密集抽样的时间序列数据而开发的。一个大不相同的环境,特别是在人口健康应用中,是纵向研究设计,在这个设计中,多个人被跟踪,在有限的时间里观察很少。纵向研究通常跟踪许多变量,这些变量可能由非线性动态调节,这些动态可能具有个人特有的特性,并表现出直接和间接的原因。此外,现实世界的纵向数据往往受到普遍缺失的影响。GC方法不适合处理这些问题。在本文件中,我们打算填补这一方法上的空白。我们提议将GC框架与基于机器学习的预测模型相结合。我们称之为GLACIALIAL,这是“Granger和LeArning基于Cusalty的Laulty 分析 ” 。GLACIal将个人视为独立的样本,并且使用平均预测准确性的个人来测试因果关系的效果。GLACIAL采用多塔基神经网络,我们打算用一个经过培训的多任务神经网络来填补这个方法。我们以机器学习基于机能动的模型的模型的模型的模型的模型, 并用一个不动动的模型的模型,可以有效地定位的模型来学习。