Data-driven prediction is becoming increasingly widespread as the volume of data available grows and as algorithmic development matches this growth. The nature of the predictions made, and the manner in which they should be interpreted, depends crucially on the extent to which the variables chosen for prediction are Markovian, or approximately Markovian. Multiscale systems provide a framework in which this issue can be analyzed. In this work kernel analog forecasting methods are studied from the perspective of data generated by multiscale dynamical systems. The problems chosen exhibit a variety of different Markovian closures, using both averaging and homogenization; furthermore, settings where scale-separation is not present and the predicted variables are non-Markovian, are also considered. The studies provide guidance for the interpretation of data-driven prediction methods when used in practice.
翻译:随着可用数据数量的增加和算法的发展与这种增长相匹配,以数据驱动的预测越来越普遍。所作的预测的性质和应如何解释这些预测,关键取决于预测所选择的变量是马尔科维安或大约马尔科维安的变量的程度。多尺度系统提供了分析这一问题的框架。在这个工作中,从多尺度动态系统产生的数据的角度研究内核模拟预测方法。所选择的问题显示了不同的马尔科维安封闭,使用平均和同质化两种方法;此外,还考虑了没有出现比例分离和预测变量是非马尔科维安的设置。这些研究为在实践中使用数据驱动预测方法的解释提供了指导。