Imitation learning for robot dexterous manipulation, especially with a real robot setup, typically requires a large number of demonstrations. In this paper, we present a data-efficient learning from demonstration framework which exploits the use of rich tactile sensing data and achieves fine bimanual pinch grasping. Specifically, we employ a convolutional autoencoder network that can effectively extract and encode high-dimensional tactile information. Further, We develop a framework that achieves efficient multi-sensor fusion for imitation learning, allowing the robot to learn contact-aware sensorimotor skills from demonstrations. Our comparision study against the framework without using encoded tactile features highlighted the effectiveness of incorporating rich contact information, which enabled dexterous bimanual grasping with active contact searching. Extensive experiments demonstrated the robustness of the fine pinch grasp policy directly learned from few-shot demonstration, including grasping of the same object with different initial poses, generalizing to ten unseen new objects, robust and firm grasping against external pushes, as well as contact-aware and reactive re-grasping in case of dropping objects under very large perturbations. Furthermore, the saliency map analysis method is used to describe weight distribution across various modalities during pinch grasping, confirming the effectiveness of our framework at leveraging multimodal information.
翻译:暂无翻译