Commercial and industrial deployments of robot fleets at Amazon, Nimble, Plus One, Waymo, and Zoox query remote human teleoperators when robots are at risk or unable to make task progress. With continual learning, interventions from the remote pool of humans can also be used to improve the robot fleet control policy over time. A central question is how to effectively allocate limited human attention. Prior work addresses this in the single-robot, single-human setting; we formalize the Interactive Fleet Learning (IFL) setting, in which multiple robots interactively query and learn from multiple human supervisors. We propose Return on Human Effort (ROHE) as a new metric and Fleet-DAgger, a family of IFL algorithms. We present an open-source IFL benchmark suite of GPU-accelerated Isaac Gym environments for standardized evaluation and development of IFL algorithms. We compare a novel Fleet-DAgger algorithm to 4 baselines with 100 robots in simulation. We also perform a physical block-pushing experiment with 4 ABB YuMi robot arms and 2 remote humans. Experiments suggest that the allocation of humans to robots significantly affects the performance of the fleet, and that the novel Fleet-DAgger algorithm can achieve up to 8.8x higher ROHE than baselines. See https://tinyurl.com/fleet-dagger for supplemental material.
翻译:在亚马逊、尼姆布尔、加一、韦莫和佐克斯的机器人机队的商业和工业部署亚马逊、尼姆布尔、加一、韦莫和佐克斯等地的机器人机队,当机器人面临风险或无法完成任务时,我们向人类远程远程电话操作员提出问题。随着不断学习,远距离人类群的干预也可以用来改进机器人机队的控制政策。一个中心问题是如何有效地分配有限的人类注意力。在单一机器人、单人环境下,我们以前的工作解决了这个问题;我们正式确定了交互式机队学习(IFL)的设置,在这个设置中,多机器人交互查询并从多个人类主管那里学习。我们提议返回人类机身(ROHE),作为新的标准仪和机队-机队-机队调算算器。我们提出了一个开放源的IFLF基准套套件,用于标准化地评估和开发IFLL算法。我们把新的机队-Dagger算法与100个模拟机器人的4个基线比起来。我们还与4个ABB Yumi机器人和2个远程人类督导师(ROH)进行硬体的试验。我们建议返回一个新的仪-DAGL8号/GLVIGLA的模型比RVGLVGLVDGLVDGL的升级要高能能能能要大大要大大到8/RVDGVDGVDGVDGVDGVDGVDGVDGVDGVDGVDGVDGVDGVD的升级。