Out-of-distribution (OOD) detection is a well-studied topic in supervised learning. Extending the successes in supervised learning methods to the reinforcement learning (RL) setting, however, is difficult due to the data generating process - RL agents actively query their environment for data, and the data are a function of the policy followed by the agent. An agent could thus neglect a shift in the environment if its policy did not lead it to explore the aspect of the environment that shifted. Therefore, to achieve safe and robust generalization in RL, there exists an unmet need for OOD detection through active experimentation. Here, we attempt to bridge this lacuna by first defining a causal framework for OOD scenarios or environments encountered by RL agents in the wild. Then, we propose a novel task: that of Out-of-Task Distribution (OOTD) detection. We introduce an RL agent that actively experiments in a test environment and subsequently concludes whether it is OOTD or not. We name our method GalilAI, in honor of Galileo Galilei, as it discovers, among other causal processes, that gravitational acceleration is independent of the mass of a body. Finally, we propose a simple probabilistic neural network baseline for comparison, which extends extant Model-Based RL. We find that GalilAI outperforms the baseline significantly. See visualizations of our method https://galil-ai.github.io/
翻译:(OOD) 检测是受监督学习的一个很好研究的专题。 但是,由于数据生成过程,将监督学习方法的成功推广到强化学习(RL)设置,很难将监督学习方法的成功推广到强化学习(RL)设置,但由于数据生成过程 -- -- RL代理积极查询数据环境,而数据是代理所遵循的政策的功能。因此,如果一个代理商的政策没有导致它探索变化环境的方面,那么它就可能忽视环境的变化。因此,为了在RL实现安全和稳健的概括化,存在通过积极实验检测OOD的未满足的需要。在这里,我们试图通过首先确定OD情景或RL代理在野外遇到的环境的因果关系框架来弥补这一空白。然后,我们提出一个新的任务:Task外分布(OTD)的检测。我们引入一个在测试环境中进行积极实验的RL代理商,然后确定它是否属于OOTD。 我们命名了我们的Galileo/Galilei方法, 因为它发现,除其他因果过程之外, 重置纳基级加速度加速度的加速度,这是我们一个基础级的图像模型的网络, 。