通过强化学习实现动态时间调和 (Dynamic Temporal Reconciliation by Reinforcement learning)

Planning based on long and short term time series forecasts is a common practice across many industries. In this context, temporal aggregation and reconciliation techniques have been useful in improving forecasts, reducing model uncertainty, and providing a coherent forecast across different time horizons. However, an underlying assumption spanning all these techniques is the complete availability of data across all levels of the temporal hierarchy, while this offers mathematical convenience but most of the time low frequency data is partially completed and it is not available while forecasting. On the other hand, high frequency data can significantly change in a scenario like the COVID pandemic and this change can be used to improve forecasts that will otherwise significantly diverge from long term actuals. We propose a dynamic reconciliation method whereby we formulate the problem of informing low frequency forecasts based on high frequency actuals as a Markov Decision Process (MDP) allowing for the fact that we do not have complete information about the dynamics of the process. This allows us to have the best long term estimates based on the most recent data available even if the low frequency cycles have only been partially completed. The MDP has been solved using a Time Differenced Reinforcement learning (TDRL) approach with customizable actions and improves the long terms forecasts dramatically as compared to relying solely on historical low frequency data. The result also underscores the fact that while low frequency forecasts can improve the high frequency forecasts as mentioned in the temporal reconciliation literature (based on the assumption that low frequency forecasts have lower noise to signal ratio) the high frequency forecasts can also be used to inform the low frequency forecasts.

翻译：根据长期和短期时间序列预测进行规划是许多行业的常见做法。在这方面,时间汇总与和解技术有助于改进预测,减少模型不确定性,在不同时间范围作出一致的预测。然而,涵盖所有这些技术的基本假设是,在时间等级的各级完全提供数据,这提供了数学便利,但大部分时间低频数据部分完成,在预测时无法提供。另一方面,高频数据在COVID大流行病等情景下可以发生重大变化,这种变化可用来改进预测,否则将会与长期实际差异很大。我们提出动态和解方法,据此,我们提出在高频率基础上通报低频率预测的问题,作为马尔科夫决策程序(MDP),这样我们就能了解整个时间等级各级数据,但大部分时间低频率数据部分完成,在预测时无法提供这些数据。即使低频率周期仅部分完成,高频率数据可以用来改进低频率预测。MDP已经通过时间差异强化学习(TDRL)方法解决了这一问题,同时,我们提出了动态调节方法,根据高频实际频率的实际频率,我们提出以高频预测为基础进行低频率预测,同时改进长期预测,而仅以低频预测作为历史频率预测的基础,可以改进高频预测。