Open-world novelty--a sudden change in the mechanics or properties of an environment--is a common occurrence in the real world. Novelty adaptation is an agent's ability to improve its policy performance post-novelty. Most reinforcement learning (RL) methods assume that the world is a closed, fixed process. Consequentially, RL policies adapt inefficiently to novelties. To address this, we introduce WorldCloner, an end-to-end trainable neuro-symbolic world model for rapid novelty adaptation. WorldCloner learns an efficient symbolic representation of the pre-novelty environment transitions, and uses this transition model to detect novelty and efficiently adapt to novelty in a single-shot fashion. Additionally, WorldCloner augments the policy learning process using imagination-based adaptation, where the world model simulates transitions of the post-novelty environment to help the policy adapt. By blending ''imagined'' transitions with interactions in the post-novelty environment, performance can be recovered with fewer total environment interactions. Using environments designed for studying novelty in sequential decision-making problems, we show that the symbolic world model helps its neural policy adapt more efficiently than model-based and model-based neural-only reinforcement learning methods.
翻译:开放-世界新颖- 环境的机械或特性突变,这是现实世界中常见的现象。 创新适应是代理人改善其政策业绩后新颖性后新颖性的能力。 大多数强化学习(RL)方法假定世界是一个封闭的固定过程。 因此,RL政策没有有效地适应新颖性。 为了解决这个问题, 我们引入了世界克隆, 这是一种最终到最终可训练的神经- 同步世界模式, 用于快速的新颖适应。 世界克隆人学会了前新颖环境转型的有效象征性代表, 并使用这种过渡模式来发现新颖性并有效地适应新颖性。 此外, WorldCloner利用基于想象的适应来强化政策学习过程, 在那里, 世界模型模拟后新颖环境的转型, 以帮助政策适应。 通过将“ imagine” 转型与基于后新颖环境的交互作用相结合, 业绩可以恢复为整个环境互动的更少。 利用设计的环境来研究新颖性新颖性新颖性, 以单一的方式适应新颖性强化政策, 我们展示了世界学习方法。