向噪音和后退:共同自治的传播 (To the Noise and Back: Diffusion for Shared Autonomy)

Shared autonomy is an operational concept in which a user and an autonomous agent collaboratively control a robotic system. It provides a number of advantages over the extremes of full-teleoperation and full-autonomy in many settings. Traditional approaches to shared autonomy rely on knowledge of the environment dynamics, a discrete space of user goals that is known a priori, or knowledge of the user's policy -- assumptions that are unrealistic in many domains. Recent works relax some of these assumptions by formulating shared autonomy with model-free deep reinforcement learning (RL). In particular, they no longer need knowledge of the goal space (e.g., that the goals are discrete or constrained) or environment dynamics. However, they need knowledge of a task-specific reward function to train the policy. Unfortunately, such reward specification can be a difficult and brittle process. On top of that, the formulations inherently rely on human-in-the-loop training, and that necessitates them to prepare a policy that mimics users' behavior. In this paper, we present a new approach to shared autonomy that employs a modulation of the forward and reverse diffusion process of diffusion models. Our approach does not assume known environment dynamics or the space of user goals, and in contrast to previous work, it does not require any reward feedback, nor does it require access to the user's policy during training. Instead, our framework learns a distribution over a space of desired behaviors. It then employs a diffusion model to translate the user's actions to a sample from this distribution. Crucially, we show that it is possible to carry out this process in a manner that preserves the user's control authority. We evaluate our framework on a series of challenging continuous control tasks, and analyze its ability to effectively correct user actions while maintaining their autonomy.

翻译：共享自主是一个操作概念, 用户和自主代理者可以合作控制机器人系统。它在许多环境中比全面操作和完全自主的极端情况提供一些优势。共享自主的传统方法依赖于环境动态知识, 即已知的先验用户目标的离散空间, 或用户政策知识 -- 在许多领域都是不切实际的假设。最近的工作通过与无模型的深度强化学习(RL)制定共同的自主性来放松一些假设。特别是, 他们不再需要对目标空间( 例如, 目标是分散的或受限的)或环境动态的了解。但是, 他们需要对特定任务奖励功能的知识, 来培训政策。不幸的是, 这种奖励的规格可能是一个困难和微小的过程。此外, 制定这些公式本身就依赖在很多领域进行人与人之间的在线培训, 并且需要他们制定一种模拟用户行为的政策模式。在本文中, 我们提出了一种共享自主性的新方法, 利用前向和逆向的传播过程, 向用户的传播能力进程或者我们的方法, 需要一种已知的进入前向用户政策流环境的对比。任何已知的流程, 需要一种进入一个已知的获取环境, 在以往的流程中, 学习一个过程需要一种对用户政策框架的排序中, 学习一个正确的环境, 任何正确的, 学习环境, 任何已知的排序。