Combined visual and force feedback play an essential role in contact-rich robotic manipulation tasks. Current methods focus on developing the feedback control around a single modality while underrating the synergy of the sensors. Fusing different sensor modalities is necessary but remains challenging. A key challenge is to achieve an effective multi-modal and generalized control scheme to novel objects with precision. This paper proposes a practical multi-modal sensor fusion mechanism using hierarchical policy learning. To begin with, we use a self-supervised encoder that extracts multi-view visual features and a hybrid motion/force controller that regulates force behaviors. Next, the multi-modality fusion is simplified by hierarchical integration of the vision, force, and proprioceptive data in the reinforcement learning (RL) algorithm. Moreover, with hierarchical policy learning, the control scheme can exploit the visual feedback limits and explore the contribution of individual modality in precise tasks. Experiments indicate that robots with the control scheme could assemble objects with 0.25mm clearance in simulation. The system could be generalized to widely varied initial configurations and new shapes. Experiments validate that the simulated system can be robustly transferred to reality without fine-tuning.
翻译:合并的视觉和武力反馈在接触丰富的机器人操纵任务中发挥着必不可少的作用。 目前的方法侧重于在降低传感器的协同作用的同时,围绕单一模式开发反馈控制。 使用不同的传感器模式是必要的, 但仍然具有挑战性。 关键的挑战是如何实现一个有效的多模式和普遍控制计划, 精确地复制物体。 本文提出一个实用的多模式传感器融合机制, 使用等级政策学习。 首先, 我们使用一个自我监督的编码器, 提取多视图视觉特征, 以及调控武力行为的混合运动/ 力量控制器。 其次, 多模式融合通过将视觉、 力和自主性数据在强化学习( RL) 算法中的等级整合而简化。 此外, 通过分级政策学习, 控制机制可以利用视觉反馈限制, 并探索单个模式在精确任务中的贡献。 实验表明, 控制系统的机器人可以在模拟中以0. 25毫米的清除方式组合物体。 系统可以被普遍化为广泛多样的初始配置和新形状。 实验证实, 模拟系统可以不经过精细微调整而有力地转换为现实。