This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot's visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 11 different tasks from 4 different robot manipulation benchmarks and find that it consistently outperforms existing state-of-the-art robot learning methods with an average improvement of 46.9%. Diffusion Policy learns the gradient of the action-distribution score function and iteratively optimizes with respect to this gradient field during inference via a series of stochastic Langevin dynamics steps. We find that the diffusion formulation yields powerful advantages when used for robot policies, including gracefully handling multimodal action distributions, being suitable for high-dimensional action spaces, and exhibiting impressive training stability. To fully unlock the potential of diffusion models for visuomotor policy learning on physical robots, this paper presents a set of key technical contributions including the incorporation of receding horizon control, visual conditioning, and the time-series diffusion transformer. We hope this work will help motivate a new generation of policy learning techniques that are able to leverage the powerful generative modeling capabilities of diffusion models. Code, data, and training details will be publicly available.
翻译:本文介绍扩散政策,这是通过代表机器人的相对摩托政策来产生机器人行为的新方式,它是一种有条件地分解扩散过程。我们从4个不同的机器人操纵基准中将扩散政策以11项不同任务作为基准,发现它一贯优于现有最先进的机器人学习方法,平均改进率为46.9%。 传播政策通过一系列随机随机的兰埃文动态步骤,学习行动分布分数函数的梯度,并在推论期间对这个梯度域进行迭接优化。我们发现,在应用机器人政策时,扩散方案具有强大的优势,包括优雅地处理多式联运行动分布,适合高维行动空间,并展示令人印象深刻的培训稳定性。为了充分挖掘在物理机器人上传播最先进的机器人政策学习推广模式的潜力,本文件提出了一套关键的技术贡献,包括纳入后退控制、视觉调节和时间序列扩散变异器。我们希望这项工作将有助于激发新一代的政策学习技术,从而能够利用强大的扩散模型的基因模型模型能力,包括公开提供数据和培训。</s>