Planning agents are ill-equipped to act in novel situations in which their domain model no longer accurately represents the world. We introduce an approach for such agents operating in open worlds that detects the presence of novelties and effectively adapts their domain models and consequent action selection. It uses observations of action execution and measures their divergence from what is expected, according to the environment model, to infer existence of a novelty. Then, it revises the model through a heuristics-guided search over model changes. We report empirical evaluations on the CartPole problem, a standard Reinforcement Learning (RL) benchmark. The results show that our approach can deal with a class of novelties very quickly and in an interpretable fashion.
翻译:计划代理在其领域模型不再准确表示世界的新情况下往往无法有效地行动。我们引入了一种适用于在开放世界中操作的代理的方法,该方法检测到新颖性的存在并有效地调整其领域模型和结果动作选择。它使用动作执行的观察结果,并根据环境模型测量其与预期的差异来推断新颖性的存在。然后,通过启发式引导搜索模型变化来修订模型。我们在CartPole问题上进行经验评估,这是一个标准的强化学习(RL)基准。结果表明,我们的方法可以快速处理一类新颖性,并且具有可解释性。