In many scientific disciplines, coarse-grained causal models are used to explain and predict the dynamics of more fine-grained systems. Naturally, such models require appropriate macrovariables. Automated procedures to detect suitable variables would be useful to leverage increasingly available high-dimensional observational datasets. This work introduces a novel algorithmic approach that is inspired by a new characterisation of causal macrovariables as information bottlenecks between microstates. Its general form can be adapted to address individual needs of different scientific goals. After a further transformation step, the causal relationships between learned variables can be investigated through additive noise models. Experiments on both simulated data and on a real climate dataset are reported. In a synthetic dataset, the algorithm robustly detects the ground-truth variables and correctly infers the causal relationships between them. In a real climate dataset, the algorithm robustly detects two variables that correspond to the two known variations of the El Nino phenomenon.
翻译:在许多科学学科中,使用粗微的因果模型来解释和预测较细微的系统动态。自然,这类模型需要适当的宏观变量。检测适当变量的自动化程序将有益于利用越来越多的高维观测数据集。这项工作引入了一种新的算法方法,该方法的灵感来自因果宏观变量作为微观国家之间信息瓶颈的新特性。其一般形式可以适应不同科学目标的个别需要。在进一步转变之后,通过添加噪音模型可以调查所学变量之间的因果关系。模拟数据和真实气候数据集的实验被报告。在合成数据集中,该算法强有力地探测了地面真相变量,并正确地推断了它们之间的因果关系。在真实的气候数据集中,该算法能够有力地检测两个与El Nino现象的两种已知变化相对应的变量。