Imitation learning addresses the challenge of learning by observing an expert's demonstrations without access to reward signals from the environment. Behavioral cloning (BC) formulates imitation learning as a supervised learning problem and learns from sampled state-action pairs. Despite its simplicity, it often fails to capture the temporal structure of the task and the global information of expert demonstrations. This work aims to augment BC by employing diffusion models for modeling expert behaviors, and designing a learning objective that leverages learned diffusion models to guide policy learning. To this end, we propose diffusion model-augmented behavioral cloning (Diffusion-BC) that combines our proposed diffusion model guided learning objective with the BC objective, which complements each other. Our proposed method outperforms baselines or achieves competitive performance in various continuous control domains, including navigation, robot arm manipulation, and locomotion. Ablation studies justify our design choices and investigate the effect of balancing the BC and our proposed diffusion model objective.
翻译:光学学习通过观察专家的演示而得不到来自环境的奖赏信号来应对学习的挑战。行为克隆(BC)将模仿学习作为受监督的学习问题,并从抽样的州-州-行动对方中学习。尽管简单,但它往往没有抓住任务的时间结构和专家演示的全球信息。这项工作的目的是利用专家行为模型的传播模型,并设计一个学习目标,利用学习的传播模型来指导政策学习。为此,我们提议推广模型-煽动行为克隆(Dimpulation-BC),将我们拟议的传播模型指导学习目标与BC目标结合起来,而BC目标是相辅相成的。我们所提议的方法超越了基线,或者在包括导航、机器人手臂操纵和移动在内的各种连续控制领域实现了竞争性业绩。进行的研究证明我们的设计选择和调查平衡不列颠哥伦比亚和我们拟议的扩散模型目标的效果是合理的。</s>