Reinforcement Learning (RL) based solutions are being adopted in a variety of domains including robotics, health care and industrial automation. Most focus is given to when these solutions work well, but they fail when presented with out of distribution inputs. RL policies share the same faults as most machine learning models. Out of distribution detection for RL is generally not well covered in the literature, and there is a lack of benchmarks for this task. In this work we propose a benchmark to evaluate OOD detection methods in a Reinforcement Learning setting, by modifying the physical parameters of non-visual standard environments or corrupting the state observation for visual environments. We discuss ways to generate custom RL environments that can produce OOD data, and evaluate three uncertainty methods for the OOD detection task. Our results show that ensemble methods have the best OOD detection performance with a lower standard deviation across multiple environments.
翻译:以强化学习(RL)为基础的解决方案正在包括机器人、医疗保健和工业自动化等多个领域得到采用。大多数重点都放在这些解决方案在何时运作良好,但当这些解决方案在发布投入之外出现时却失败了。RL政策与大多数机器学习模式有相同的缺点。文献中通常没有很好地涵盖RL的分布检测,而且缺乏这项任务的基准。在这项工作中,我们提出了一个基准,通过修改非视觉标准环境的物理参数或腐蚀对视觉环境的状态观测,来评估强化学习环境中OOOD检测方法。我们讨论了如何生成可生成 OOD数据的定制RL环境,并评估OOD检测任务的三个不确定性方法。我们的结果显示,混合方法具有最佳的OOD检测功能,在多个环境中的标准偏差较低。