Shared autonomy is an operational concept in which a user and an autonomous agent collaboratively control a robotic system. It provides a number of advantages over the extremes of full-teleoperation and full-autonomy in many settings. Traditional approaches to shared autonomy rely on knowledge of the environment dynamics, a discrete space of user goals that is known a priori, or knowledge of the user's policy -- assumptions that are unrealistic in many domains. Recent works relax some of these assumptions by formulating shared autonomy with model-free deep reinforcement learning (RL). In particular, they no longer need knowledge of the goal space (e.g., that the goals are discrete or constrained) or environment dynamics. However, they need knowledge of a task-specific reward function to train the policy. Unfortunately, such reward specification can be a difficult and brittle process. On top of that, the formulations inherently rely on human-in-the-loop training, and that necessitates them to prepare a policy that mimics users' behavior. In this paper, we present a new approach to shared autonomy that employs a modulation of the forward and reverse diffusion process of diffusion models. Our approach does not assume known environment dynamics or the space of user goals, and in contrast to previous work, it does not require any reward feedback, nor does it require access to the user's policy during training. Instead, our framework learns a distribution over a space of desired behaviors. It then employs a diffusion model to translate the user's actions to a sample from this distribution. Crucially, we show that it is possible to carry out this process in a manner that preserves the user's control authority. We evaluate our framework on a series of challenging continuous control tasks, and analyze its ability to effectively correct user actions while maintaining their autonomy.
翻译:共享自主是一个操作概念, 用户和自主代理者可以合作控制机器人系统。 它在许多环境中比全面操作和完全自主的极端情况提供一些优势。 共享自主的传统方法依赖于环境动态知识, 即已知的先验用户目标的离散空间, 或用户政策知识 -- 在许多领域都是不切实际的假设。 最近的工作通过制定无模型深度强化学习(RL)的共同自主来放松一些这些假设。 特别是, 他们不再需要对目标空间( 即目标离散或受限)或环境动态的了解。 然而, 他们需要对特定任务奖励功能的知识, 来培训政策。 不幸的是, 这种奖励的规格可能是一个困难和微小的过程。 此外, 制定这些公式本身就依赖在很多领域不切实际的人类操作培训, 并且需要他们制定一种模拟用户行为的政策模式。 在本文中, 我们提出一种共享自主的新方法, 利用前向前向和向反向的传播流程, 而不是在用户行为变现过程中, 我们的方法并不需要一种已知的进入前向用户政策, 学习环境。</s>