Reinforcement learning agents perform well when presented with inputs within the distribution of those encountered during training. However, they are unable to respond effectively when faced with novel, out-of-distribution events, until they have undergone additional training. This paper presents an online, data-driven, emergency-response method that aims to provide autonomous agents the ability to react to unexpected situations that are very different from those it has been trained or designed to address. In such situations, learned policies cannot be expected to perform appropriately since the observations obtained in these novel situations would fall outside the distribution of inputs that the agent has been optimized to handle. The proposed approach devises a customized response to the unforeseen situation sequentially, by selecting actions that minimize the rate of increase of the reconstruction error from a variational auto-encoder. This optimization is achieved online in a data-efficient manner (on the order of 30 data-points) using a modified Bayesian optimization procedure. We demonstrate the potential of this approach in a simulated 3D car driving scenario, in which the agent devises a response in under 2 seconds to avoid collisions with objects it has not seen during training.
翻译:培训期间遇到的强化学习人员在分发培训过程中得到投入时表现良好,但在面临新的、分配外的活动时,他们无法有效作出反应,直到他们接受了额外培训;本文件介绍了一种在线、数据驱动的应急反应方法,目的是让自主人员有能力对与培训或设计要处理的情况截然不同的意外情况作出反应;在这种情况下,由于在这些新情况下获得的观测结果不属于该人员最佳处理的投入的分发范围,因此,学习的政策无法适当执行;拟议办法设计了一种针对意外情况的定制反应,选择了从变式自动编码中尽量减少重建错误增加速度的行动(按30个数据点的顺序),采用经修改的Bayesian优化程序在网上实现这种优化。我们在模拟的3D汽车驾驶假设中展示了这一方法的潜力,即该代理人员在2秒钟内设计出一种反应,以避免与培训期间未见到的物体发生碰撞。