Real-world data, for example in climate applications, often consists of spatially gridded time series data or data with comparable structure. While the underlying system is often believed to behave similar at different points in space and time, those variations that do exist are twofold relevant: They often encode important information in and of themselves. And they may negatively affect the stability / convergence and reliability\Slash{}validity of results of algorithms assuming stationarity or space-translation invariance. We study the information encoded in changes of the causal graph, with stability in mind. An analysis of this general task identifies two core challenges. We develop guiding principles to overcome these challenges, and provide a framework realizing these principles by modifying constraint-based causal discovery approaches on the level of independence testing. This leads to an extremely modular, easily extensible and widely applicable framework. It can leverage existing constraint-based causal discovery methods (demonstrated on IID-algorithms PC, PC-stable, FCI and time series algorithms PCMCI, PCMCI+, LPCMCI) with little to no modification. The built-in modularity allows to systematically understand and improve upon an entire array of subproblems. By design, it can be extended by leveraging insights from change-point-detection, clustering, independence-testing and other well-studied related problems. The division into more accessible sub-problems also simplifies the understanding of fundamental limitations, hyperparameters controlling trade-offs and the statistical interpretation of results. An open-source implementation will be available soon.
翻译:现实世界数据(例如气候应用中的数据)通常由空间网格化时间序列数据或具有类似结构的数据组成。尽管底层系统在不同时空点常被认为表现相似,但确实存在的变异具有双重重要性:它们本身常编码重要信息;且可能对假设平稳性或空间平移不变性的算法的稳定性/收敛性及结果可靠性/有效性产生负面影响。我们以稳定性为考量,研究因果图变化中编码的信息。对此一般任务的分析识别出两个核心挑战。我们提出克服这些挑战的指导原则,并通过在独立性检验层面改进基于约束的因果发现方法,实现了一个遵循这些原则的框架。该框架具有高度模块化、易于扩展和广泛适用的特点。它能够以极少或无需修改的方式利用现有基于约束的因果发现方法(已在IID算法PC、PC-stable、FCI以及时间序列算法PCMCI、PCMCI+、LPCMCI上验证)。内置的模块化设计允许系统性地理解并改进一系列子问题。通过设计,该框架可借助变点检测、聚类、独立性检验及其他已深入研究的相关问题的见解进行扩展。将问题分解为更易处理的子问题,也简化了对基本局限性的理解、权衡超参数的控制以及结果的统计解释。开源实现即将发布。