Diffusion models have emerged as powerful generative models in the text-to-image domain. This paper studies their application as observation-to-action models for imitating human behaviour in sequential environments. Human behaviour is stochastic and multimodal, with structured correlations between action dimensions. Meanwhile, standard modelling choices in behaviour cloning are limited in their expressiveness and may introduce bias into the cloned policy. We begin by pointing out the limitations of these choices. We then propose that diffusion models are an excellent fit for imitating human behaviour, since they learn an expressive distribution over the joint action space. We introduce several innovations to make diffusion models suitable for sequential environments; designing suitable architectures, investigating the role of guidance, and developing reliable sampling strategies. Experimentally, diffusion models closely match human demonstrations in a simulated robotic control task and a modern 3D gaming environment.
翻译:在文字到图像域中,传播模型已成为强大的基因模型。本文研究它们作为观察到行动模型在相继环境中模仿人类行为的应用情况。人类的行为是随机的和多式的,在行动层面之间有着结构上的相关性。与此同时,行为克隆的标准模型选择在表达上受到限制,可能会在克隆政策中引入偏见。我们首先指出这些选择的局限性。我们然后建议传播模型非常适合模仿人类的行为,因为它们在联合行动空间上有一个清晰的分布。我们引入了若干创新,使传播模型适合相继环境;设计适当的结构,调查指导的作用,并制定可靠的取样战略。实验性传播模型在模拟机器人控制任务和现代的3D游戏环境中与人类演示非常吻合。