We consider the problem of generalization in reinforcement learning where visual aspects of the observations might differ, e.g. when there are different backgrounds or change in contrast, brightness, etc. We assume that our agent has access to only a few of the MDPs from the MDP distribution during training. The performance of the agent is then reported on new unknown test domains drawn from the distribution (e.g. unseen backgrounds). For this "zero-shot RL" task, we enforce invariance of the learned representations to visual domains via a domain adversarial optimization process. We empirically show that this approach allows achieving a significant generalization improvement to new unseen domains.
翻译:我们考虑了强化学习中的概括化问题,因为观测的视觉方面可能不同,例如,不同背景或不同变化、亮度等等。我们假设我们的代理商在培训期间只能接触来自MDP分布的少数MDP。然后,该代理商的表现报告在从分布中提取的新的未知测试域(例如,看不见背景)上。对于这项“零光RL”任务,我们通过一个域对称优化程序,将学到的表达方式与视觉领域不相容。我们从经验上表明,这一方法可以使新的无形领域实现显著的概括化改进。