We study the problem of out-of-distribution dynamics (OODD) detection, which involves detecting when the dynamics of a temporal process change compared to the training-distribution dynamics. This is relevant to applications in control, reinforcement learning (RL), and multi-variate time-series, where changes to test time dynamics can impact the performance of learning controllers/predictors in unknown ways. This problem is particularly important in the context of deep RL, where learned controllers often overfit to the training environment. Currently, however, there is a lack of established OODD benchmarks for the types of environments commonly used in RL research. Our first contribution is to design a set of OODD benchmarks derived from common RL environments with varying types and intensities of OODD. Our second contribution is to design a strong OODD baseline approach based on recurrent implicit quantile networks (RIQNs), which monitors autoregressive prediction errors for OODD detection. Our final contribution is to evaluate the RIQN approach on the benchmarks to provide baseline results for future comparison.
翻译:我们研究分配外动态(OODD)的探测问题,这涉及到与培训-分配动态相比,在时间过程变化动态与培训-分配动态相比发生时,探测时间变化的动态,这与控制、强化学习(RL)和多变量时间序列的应用有关,测试时间动态的变化可以以未知的方式影响学习控制/定位器的性能。在深度RL的背景下,这一问题尤其重要,学习到的控制器往往与培训环境相匹配。然而,目前缺乏关于RL研究中常用环境类型的既定的OODD基准。我们的第一个贡献是设计一套从具有不同类型和强度OODDD的通用RL环境中得出的ODD基准。我们的第二个贡献是设计一个强有力的ODD基线方法,该基准基于经常性的隐含孔径网络(RIQNs),监测ODDD检测的自动递增预测错误。我们的最后贡献是评价RIQN关于基准的方法,以提供基准结果,供今后比较。