Imitation learning and instruction-following are two common approaches to communicate a user's intent to a learning agent. However, as the complexity of tasks grows, it could be beneficial to use both demonstrations and language to communicate with an agent. In this work, we propose a novel setting where an agent is given both a demonstration and a description, and must combine information from both the modalities. Specifically, given a demonstration for a task (the source task), and a natural language description of the differences between the demonstrated task and a related but different task (the target task), our goal is to train an agent to complete the target task in a zero-shot setting, that is, without any demonstrations for the target task. To this end, we introduce Language-Aided Reward and Value Adaptation (LARVA) which, given a source demonstration and a linguistic description of how the target task differs, learns to output a reward / value function that accurately describes the target task. Our experiments show that on a diverse set of adaptations, our approach is able to complete more than 95% of target tasks when using template-based descriptions, and more than 70% when using free-form natural language.
翻译:光学学习和教学跟踪是向学习机构传达用户意向的两个常见方法。 但是,随着任务的复杂性增加,使用演示和语言与代理机构进行沟通可能是有益的。 在这项工作中,我们建议了一个新设置,向代理机构同时提供演示和描述,并且必须同时将两种模式的信息结合起来。具体地说,为一项任务(源任务)提供演示,并用自然语言描述所显示的任务与一项相关但不同的任务(目标任务)之间的差异,我们的目标是培训代理机构在零点显示环境中完成目标任务,也就是说,不为目标任务进行任何演示。为此,我们引入了语言辅助奖励和价值调整(LARVA),根据源演示和对目标任务差异的语言描述,学习产生奖励/价值功能,准确描述目标任务。我们的实验显示,在一系列不同的适应活动中,我们的方法能够在使用模板描述时完成95%以上的目标任务,在使用自由形式自然语言时超过70%。