CCDP：基于引导采样的条件扩散策略组合 (CCDP: Composition of Conditional Diffusion Policies with Guided Sampling)

Imitation Learning offers a promising approach to learn directly from data without requiring explicit models, simulations, or detailed task definitions. During inference, actions are sampled from the learned distribution and executed on the robot. However, sampled actions may fail for various reasons, and simply repeating the sampling step until a successful action is obtained can be inefficient. In this work, we propose an enhanced sampling strategy that refines the sampling distribution to avoid previously unsuccessful actions. We demonstrate that by solely utilizing data from successful demonstrations, our method can infer recovery actions without the need for additional exploratory behavior or a high-level controller. Furthermore, we leverage the concept of diffusion model decomposition to break down the primary problem, which may require long-horizon history to manage failures, into multiple smaller, more manageable sub-problems in learning, data collection, and inference, thereby enabling the system to adapt to variable failure counts. Our approach yields a low-level controller that dynamically adjusts its sampling space to improve efficiency when prior samples fall short. We validate our method across several tasks, including door opening with unknown directions, object manipulation, and button-searching scenarios, demonstrating that our approach outperforms traditional baselines.

翻译：模仿学习提供了一种直接从数据中学习的有前景的途径，无需显式模型、仿真或详细的任务定义。在推理过程中，动作从学习到的分布中采样并在机器人上执行。然而，采样得到的动作可能因各种原因失败，而简单地重复采样步骤直到获得成功动作可能效率低下。在本工作中，我们提出了一种增强的采样策略，该策略通过优化采样分布来避免先前失败的动作。我们证明，仅利用成功演示的数据，我们的方法就能推断出恢复动作，而无需额外的探索行为或高层控制器。此外，我们利用扩散模型分解的概念，将可能需要长时程历史来处理失败的主要问题，分解为学习、数据收集和推理中多个更小、更易于管理的子问题，从而使系统能够适应可变的失败次数。我们的方法产生了一个低层控制器，该控制器动态调整其采样空间，以在先前的采样失败时提高效率。我们在多个任务上验证了我们的方法，包括未知方向的门开启、物体操作和按钮搜索场景，结果表明我们的方法优于传统基线。