State-of-the-art human-in-the-loop robot grasping is hugely suffered by Electromyography (EMG) inference robustness issues. As a workaround, researchers have been looking into integrating EMG with other signals, often in an ad hoc manner. In this paper, we are presenting a method for end-to-end training of a policy for human-in-the-loop robot grasping on real reaching trajectories. For this purpose we use Reinforcement Learning (RL) and Imitation Learning (IL) in DEXTRON (DEXTerity enviRONment), a stochastic simulation environment with real human trajectories that are augmented and selected using a Monte Carlo (MC) simulation method. We also offer a success model which once trained on the expert policy data and the RL policy roll-out transitions, can provide transparency to how the deep policy works and when it is probably going to fail.
翻译:电感学(EMG)推断稳健性问题极大地影响了最先进的人到流机器人的捕捉。 作为一种变通办法,研究人员一直在研究如何将环境管理小组与其他信号相结合,通常是以临时的方式。 在本文中,我们正在提出一种方法,用于对人到流机器人的政策进行端到端培训,以掌握真正到达轨道上的人类到端机器人。 为此,我们在DEXTRON(Dexterity environment)使用强化学习(RL)和模拟学习(IL),这是一个具有真正人类轨迹的随机模拟环境,使用蒙特卡洛(MC)模拟方法予以扩大和选择。 我们还提供了一个成功模式,一旦在专家政策数据和RL政策推出过渡方面受过培训,就能为深度政策如何运作以及何时可能失败提供透明度。