The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from noisily and partially observed data. We compare pure data-driven learning with hybrid models which incorporate imperfect domain knowledge. Our formulation is agnostic to the chosen machine learning model, is presented in both continuous- and discrete-time settings, and is compatible both with model errors that exhibit substantial memory and errors that are memoryless. First, we study memoryless linear (w.r.t. parametric-dependence) model error from a learning theory perspective, defining excess risk and generalization error. For ergodic continuous-time systems, we prove that both excess risk and generalization error are bounded above by terms that diminish with the square-root of T, the time-interval over which training data is specified. Secondly, we study scenarios that benefit from modeling with memory, proving universal approximation theorems for two classes of continuous-time recurrent neural networks (RNNs): both can learn memory-dependent model error. In addition, we connect one class of RNNs to reservoir computing, thereby relating learning of memory-dependent error to recent work on supervised learning between Banach spaces using random features. Numerical results are presented (Lorenz '63, Lorenz '96 Multiscale systems) to compare purely data-driven and hybrid approaches, finding hybrid methods less data-hungry and more parametrically efficient. Finally, we demonstrate numerically how data assimilation can be leveraged to learn hidden dynamics from noisy, partially-observed data, and illustrate challenges in representing memory by this approach, and in the training of such models.
翻译:为动态系统开发以数据为基础的预测模型,在许多学科中引起了广泛的兴趣。我们为混合机械学和机器学习方法提供了一个统一框架,以混合机械学和机器学习方法,从惯性数据和部分观测的数据中识别动态系统。我们将纯数据驱动的学习与包含不完善域知识的混合模型进行比较。我们的配方对选择的机器学习模式具有不可知性,以连续和离散的时间设置方式提出,并与显示大量内存和失忆错误的模型错误兼容。首先,我们从学习理论的角度研究无内存的线性模型错误(w.r.t.parat-searth-standing):我们从纯理论的角度来界定超现风险和超常化错误。对于ergodic持续时间系统,我们证明超重风险和一般化错误都与包含在与T的平方根(即培训数据所用的时间间隔断)的术语相比,在连续和不留记忆模型的模型的模型中,我们研究有利于模拟模型的情景,为连续和连续周期性神经网络的两个类别(RNNS(RN) 的统(RNN) 方法:既可以学习耐久的内向最近的内测的模型错误,我们又将一个数据连接数据连接数据连接到不断的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型,又进行学习。