Learning how humans manipulate objects requires machines to acquire knowledge from two perspectives: one for understanding object affordances and the other for learning human's interactions based on the affordances. Even though these two knowledge bases are crucial, we find that current databases lack a comprehensive awareness of them. In this work, we propose a multi-modal and rich-annotated knowledge repository, OakInk, for visual and cognitive understanding of hand-object interactions. We start to collect 1,800 common household objects and annotate their affordances to construct the first knowledge base: Oak. Given the affordance, we record rich human interactions with 100 selected objects in Oak. Finally, we transfer the interactions on the 100 recorded objects to their virtual counterparts through a novel method: Tink. The recorded and transferred hand-object interactions constitute the second knowledge base: Ink. As a result, OakInk contains 50,000 distinct affordance-aware and intent-oriented hand-object interactions. We benchmark OakInk on pose estimation and grasp generation tasks. Moreover, we propose two practical applications of OakInk: intent-based interaction generation and handover generation. Our datasets and source code are publicly available at https://github.com/lixiny/OakInk.
翻译:人类如何操控天体需要机器从两个角度获取知识:一个是理解天体,另一个是学习人基于天体的相互作用。尽管这两个知识基础至关重要,但我们发现目前的数据库缺乏对它们的全面认识。在这项工作中,我们提议建立一个多模式和丰富的附加说明的知识库OakInk,用于视觉和认知地理解手动物体相互作用。我们开始收集1 800个常见家用物体,并通知它们建造第一个知识基础:Oak。鉴于其价格,我们记录了与橡树中100个选定天体的丰富人类相互作用。最后,我们通过一种新颖的方法将100个记录对象的相互作用转移到虚拟对口单位:Tink。所记录和转移的手动物体相互作用构成第二个知识基础:Ink。结果,OakInk含有50 000个独特的支付能力与意图导向的手动对象相互作用。我们以OakInk为基准,以构建和理解生成任务。此外,我们提议两个实际应用的OakInk:基于意图的互动产生和转换。我们的数据设置和源代码在 httpsli/Ogli/源码上公开提供。