Tactile sensing is critical for humans to perform everyday tasks. While significant progress has been made in analyzing object grasping from vision, it remains unclear how we can utilize tactile sensing to reason about and model the dynamics of hand-object interactions. In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects. We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model, which can then be used on its own during the test time. The tactile model aims to predict the 3d locations of both the hand and the object purely from the touch data by combining a predictive model and a contrastive learning module. This framework can reason about the interaction patterns from the tactile data, hallucinate the changes in the environment, estimate the uncertainty of the prediction, and generalize to unseen objects. We also provide detailed ablation studies regarding different system designs as well as visualizations of the predicted trajectories. This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing, which opens the door for future applications in activity learning, human-computer interactions, and imitation learning for robotics.
翻译:触觉感知对于人类执行日常任务至关重要。 虽然在分析从视觉中捕捉物体方面已经取得重大进展, 但仍不清楚我们如何利用触摸感知来解释和模拟手触物体相互作用的动态。 在这项工作中, 我们使用高分辨率触摸手套在多样化的一组物体上进行四种不同的交互活动。 我们用一个跨模式学习框架构建我们的模型, 并用直观处理管道生成标签, 以监督触摸模型, 然后可以在测试期间自行使用。 触摸模型的目的是通过将预测模型和对比性学习模块结合起来, 预测手和对象的3个位置完全来自触摸数据。 这个框架可以解释触摸数据中的交互模式, 迷惑环境的变化, 估计预测的不确定性, 并概括到看不见的物体。 我们还提供详细的触摸研究, 不同的系统设计以及预测的轨迹的可视化。 这项工作在动态感知性模型上迈出了一步, 将人类的3个位置和对象完全从触摸中进行, 学习亲感的机器人互动, 以模拟方式学习人类的磁感学活动。