In order to deploy autonomous agents in digital interactive environments, they must be able to act robustly in unseen situations. The standard machine learning approach is to include as much variation as possible into training these agents. The agents can then interpolate within their training, but they cannot extrapolate much beyond it. This paper proposes a principled approach where a context module is coevolved with a skill module in the game. The context module recognizes the temporal variation in the game and modulates the outputs of the skill module so that the action decisions can be made robustly even in previously unseen situations. The approach is evaluated in the Flappy Bird and LunarLander video games, as well as in the CARLA autonomous driving simulation. The Context+Skill approach leads to significantly more robust behavior in environments that require extrapolation beyond training. Such a principled generalization ability is essential in deploying autonomous agents in real-world tasks, and can serve as a foundation for continual adaptation as well.
翻译:为了在数字互动环境中部署自主代理器,他们必须能够在不为人知的情况下采取强有力的行动。 标准的机器学习方法是在培训这些代理器时尽可能多地包含差异。 代理器然后可以在培训中进行内插, 但是不能外推。 本文提出了一个原则性方法, 环境模块与游戏中的技能模块相融合。 上下文模块承认游戏的时间差异, 并调整技能模块的输出, 以便即使在以前不为人知的情况下也能强有力地做出行动决定 。 该方法在Flappy Bird和LunarLander视频游戏以及CARLA自动驾驶模拟中进行了评估 。 环境+技能方法导致在需要超出培训外推算的环境中采取更强有力的行为 。 这种有原则性的一般化能力对于在现实世界任务中部署自主代理器至关重要, 并且可以作为持续适应的基础 。