学习没有人类演示的远程行动 (Learning Latent Actions without Human Demonstrations)

We can make it easier for disabled users to control assistive robots by mapping the user's low-dimensional joystick inputs to high-dimensional, complex actions. Prior works learn these mappings from human demonstrations: a non-disabled human either teleoperates or kinesthetically guides the robot arm through a variety of motions, and the robot learns to reproduce the demonstrated behaviors. But this framework is often impractical -- disabled users will not always have access to external demonstrations! Here we instead learn diverse teleoperation mappings without either human demonstrations or pre-defined tasks. Under our unsupervised approach the robot first optimizes for object state entropy: i.e., the robot autonomously learns to push, pull, open, close, or otherwise change the state of nearby objects. We then embed these diverse, object-oriented behaviors into a latent space for real-time control: now pressing the joystick causes the robot to perform dexterous motions like pushing or opening. We experimentally show that -- with a best-case human operator -- our unsupervised approach actually outperforms the teleoperation mappings learned from human demonstrations, particularly if those demonstrations are noisy or imperfect. But user study results are less clear-cut: although our approach enables participants to complete tasks with multiple objects more quickly, the unsupervised mapping also learns motions that the human does not need, and these additional behaviors may confuse the human.

翻译：我们可以让残疾用户更容易控制辅助机器人, 方法是绘制用户低维的游戏杆输入到高维、复杂的动作中。先前的作品从人类演示中学习这些绘图: 一个非残疾的人类, 或者是远程操作, 或者是运动感官引导机器人手臂通过各种动作, 而机器人则学习复制所显示的行为。但是这个框架通常不切实际 -- 残疾用户不会总是有机会使用外部演示! 我们在这里学习多种远程操作图, 而没有人类演示或预先确定的任务。在我们的未受监督的方法下, 我们的机器人首先优化物体状态: 也就是说, 机器人自主地学习推动、拉动、打开、关闭或以其他方式改变附近物体的状态。我们然后将这些多样化的、面向对象的行为嵌入一个潜在的空间, 以便实时控制。现在按下游戏杆可以让机器人像推动或打开那样进行极具魅力的动作。我们实验性地显示 -- 有了最能处理的人操作者 -- 我们的不超强的方法实际上超越了从人类演示中学会的远程绘图, 。特别是如果这些演示者的行为不完美, 能够快速地学习不精确的动作。