When humans grasp objects in the real world, we often move our arms to hold the object in a different pose where we can use it. In contrast, typical lab settings only study the stability of the grasp immediately after lifting, without any subsequent re-positioning of the arm. However, the grasp stability could vary widely based on the object's holding pose, as the gravitational torque and gripper contact forces could change completely. To facilitate the study of how holding poses affect grasp stability, we present PoseIt, a novel multi-modal dataset that contains visual and tactile data collected from a full cycle of grasping an object, re-positioning the arm to one of the sampled poses, and shaking the object. Using data from PoseIt, we can formulate and tackle the task of predicting whether a grasped object is stable in a particular held pose. We train an LSTM classifier that achieves 85% accuracy on the proposed task. Our experimental results show that multi-modal models trained on PoseIt achieve higher accuracy than using solely vision or tactile data and that our classifiers can also generalize to unseen objects and poses.
翻译:当人类在现实世界中捕捉物体时,我们常常移动我们的手臂,将物体放在一个可以使用的不同姿势中。相比之下,典型的实验室设置只研究升起后紧接握住的稳定性,而不随后重新定位手臂。然而,根据物体的姿势,握住的稳定性可能大不相同,因为引力力和握手接触力可以完全改变。为了便利研究持力如何影响握住的稳定性,我们提出了PoseIT,这是一个新型的多式数据集,包含从一个物体全周期中采集的视觉和触摸数据,将手臂重新定位到一个样品,摇动物体。我们利用PoseIt的数据,可以制定并完成预测一个被抓住的物体是否稳定在特定姿势上的任务。我们训练了一个LSTM分类器,该分类器在拟议任务上达到85%的精确度。我们的实验结果表明,在PoseIt上培训的多式模型比仅仅使用视觉或触摸摸的数据更精确,而且我们的分类者也可以将一个普通的物体和隐形。