Despite the broad application of deep reinforcement learning (RL), transferring and adapting the policy to unseen but similar environments is still a significant challenge. Recently, the language-conditioned policy is proposed to facilitate policy transfer through learning the joint representation of observation and text that catches the compact and invariant information across environments. Existing studies of language-conditioned RL methods often learn the joint representation as a simple latent layer for the given instances (episode-specific observation and text), which inevitably includes noisy or irrelevant information and cause spurious correlations that are dependent on instances, thus hurting generalization performance and training efficiency. To address this issue, we propose a conceptual reinforcement learning (CRL) framework to learn the concept-like joint representation for language-conditioned policy. The key insight is that concepts are compact and invariant representations in human cognition through extracting similarities from numerous instances in real-world. In CRL, we propose a multi-level attention encoder and two mutual information constraints for learning compact and invariant concepts. Verified in two challenging environments, RTFM and Messenger, CRL significantly improves the training efficiency (up to 70%) and generalization ability (up to 30%) to the new environment dynamics.
翻译:尽管广泛应用了深入强化学习(RL),但转移和调整政策使之适应看不见但相似的环境仍是一个重大挑战。最近,提议了语言条件政策,以通过学习在环境中捕捉紧凑和变化不定信息的观察和文本的共同表述,促进政策转让。现有语言条件的RL方法研究往往将联合表述作为特定情况(具体观察和文本)的一个简单的潜在层来学习(具体观察和文本),这不可避免地包括吵闹或无关的信息,并造成依赖实例的虚假关联,从而损害一般化的绩效和培训效率。为解决这一问题,我们提议了一个概念强化学习框架,以学习以概念为样的联合表述,以学习以语言为条件的政策的共同表述。关键见解是,概念是结合和在人类同化中具有差异性的。在CRL中,我们建议对学习紧凑和差异概念进行多层次的关注和两个相互信息限制。在两个富有挑战的环境(RTFM和使者)中证实,CRL大大提高了培训效率(高达70%)和普遍化能力(至30 %)。</s>