Current model-based reinforcement learning methods struggle when operating from complex visual scenes due to their inability to prioritize task-relevant features. To mitigate this problem, we propose learning Task Informed Abstractions (TIA) that explicitly separates reward-correlated visual features from distractors. For learning TIA, we introduce the formalism of Task Informed MDP (TiMDP) that is realized by training two models that learn visual features via cooperative reconstruction, but one model is adversarially dissociated from the reward signal. Empirical evaluation shows that TIA leads to significant performance gains over state-of-the-art methods on many visual control tasks where natural and unconstrained visual distractions pose a formidable challenge.
翻译:在复杂的视觉场景中,由于无法确定与任务有关的特征的优先次序,在从复杂的视觉场景中操作时,当前基于模型的强化学习方法在挣扎。为了缓解这一问题,我们提议学习任务、知情抽象(TIA),明确区分与奖赏相关的视觉特征和转移物。为了学习任务,我们引入了通过培训两种模式来学习视觉特征的正规化模式(TIMDP ), 但一种模式与奖赏信号截然脱钩。 经验性评估表明,在自然和不受限制的视觉分心构成巨大挑战的许多视觉控制任务中,学习任务最先进的方法可以带来显著的业绩收益。