We study the problem of out-of-distribution dynamics (OODD) detection, which involves detecting when the dynamics of a temporal process change compared to the training-distribution dynamics. This is relevant to applications in control, reinforcement learning (RL), and multi-variate time-series, where changes to test time dynamics can impact the performance of learning controllers/predictors in unknown ways. This problem is particularly important in the context of deep RL, where learned controllers often overfit to the training environment. Currently, however, there is a lack of established OODD benchmarks for the types of environments commonly used in RL research. Our first contribution is to design a set of OODD benchmarks derived from common RL environments with varying types and intensities of OODD. Our second contribution is to design a strong OODD baseline approach based on recurrent implicit quantile network (RIQN), which monitors autoregressive prediction errors for OODD detection. In addition to RIQN, we introduce and test three other simpler baselines. Our final contribution is to evaluate our baseline approaches on the benchmarks to provide results for future comparison.
翻译:我们研究分配不足动态(OODD)的探测问题,这涉及到与培训分配动态相比,在时间过程变化动态与培训分配动态相比发生时,探测时间变化的动态,这与控制、强化学习(RL)和多变量时间序列的应用有关,测试时间动态的变化可以以未知的方式影响学习控制者/指标的性能。在深层RL的背景下,这一问题尤其重要,学习到的控制者往往与培训环境相匹配。然而,目前缺乏关于RL研究中常用环境类型的既定OODD基准。我们的第一个贡献是设计一套来自不同类型和强度ODDD的通用RL环境的ODD基准。我们的第二个贡献是设计一个强有力的ODD基线方法,该方法基于经常性的隐含孔径网络(RIQN),监测ODDD检测的自动递减预测错误。除了RIQN外,我们还引入和测试另外三个更简单的基准。我们的最后贡献是评估基准的基准方法,以提供未来比较的结果。