Statistic modeling and data-driven learning are the two vital fields that attract many attentions. Statistic models intend to capture and interpret the relationships among variables, while data-based learning attempt to extract information directly from the data without pre-processing through complex models. Given the extensive studies in both fields, a subtle issue is how to properly integrate data based methods with existing knowledge or models. In this paper, based on the time series data, we propose two different directions to integrate the two, a decomposition-based method and a method exploiting the statistic extraction of data features. The first one decomposes the data into linear stable, nonlinear stable and unstable parts, where suitable statistical models are used for the linear stable and nonlinear stable parts while the appropriate machine learning tools are used for the unstable parts. The second one applies statistic models to extract statistics features of data and feed them as additional inputs into the machine learning platform for training. The most critical and challenging thing is how to determine and extract the valuable information from mathematical or statistical models to boost the performance of machine learning algorithms. We evaluate the proposal using time series data with varying degrees of stability. Performance results show that both methods can outperform existing schemes that use models and learning separately, and the improvements can be over 60%. Both our proposed methods are promising in bridging the gap between model-based and data-driven schemes and integrating the two to provide an overall higher learning performance.
翻译:统计模型和数据驱动的学习是吸引许多注意力的两个重要领域。统计模型打算捕捉和解释变量之间的关系,而基于数据的学习试图直接从数据中提取信息,而不通过复杂的模型进行预处理。鉴于在这两个领域进行的广泛研究,一个微妙的问题是如何适当地将基于数据的方法与现有的知识或模型结合起来。在本文中,根据时间序列数据,我们提出两个不同的方向,将两者结合起来,一种基于分解的方法,一种利用统计提取数据特征的方法。第一个是将数据分解成线性稳定、非线性稳定、不稳定的部分,而基于数据的学习试图直接从数据中提取信息,而无需通过复杂的模型直接从数据中提取信息。在使用适当的机器学习工具进行线性稳定、非线性稳定的部分时,采用适当的统计模型。第二个微妙的问题是如何将数据统计模型用于数据统计特征与现有知识或模型中的额外投入。 最关键和最具挑战的是,如何确定和从数学或统计模型中提取宝贵的信息,以提高机器学习算法的性。我们用时间序列数据来评估建议,用不同的时间序列数据,在稳定度上使用适当的固定部分使用适当的统计模型。业绩结果显示两种方法可以超越现有的方法。两种方法。两种方法可以超越现有方法。两种方法,在学习模式和过渡方法,两种方法都采用不同的过渡方法。