Object manipulation in cluttered scenes is a difficult and important problem in robotics. To efficiently manipulate objects, it is crucial to understand their surroundings, especially in cases where multiple objects are stacked one on top of the other, preventing effective grasping. We here present DUQIM-Net, a decision-making approach for object manipulation in a setting of stacked objects. In DUQIM-Net, the hierarchical stacking relationship is assessed using Adj-Net, a model that leverages existing Transformer Encoder-Decoder object detectors by adding an adjacency head. The output of this head probabilistically infers the underlying hierarchical structure of the objects in the scene. We utilize the properties of the adjacency matrix in DUQIM-Net to perform decision making and assist with object-grasping tasks. Our experimental results show that Adj-Net surpasses the state-of-the-art in object-relationship inference on the Visual Manipulation Relationship Dataset (VMRD), and that DUQIM-Net outperforms comparable approaches in bin clearing tasks.
翻译:在乱七八糟的场景中,物体操纵是一个困难而重要的机器人操作问题。 要高效操作对象, 关键是要了解周围环境, 特别是在多个物体堆叠在另一物体之上的情况下, 防止有效捕捉。 我们在这里展示了 DUQIM- Net, 这是在堆叠物体设置中用于物体操纵的决策方法。 在 DUQIM- Net 中, 使用 Adj- Net 评估了等级堆叠关系, 这是一种模型, 利用现有的变异器 Encoder- Decoder 对象探测器, 添加了一个相邻头。 这个头部的输出概率性能可以推断到该场景中物体的底部等级结构, 我们利用 DUQIM- Net 中的相邻矩阵特性来进行决策并协助进行对象拼刻任务 。 我们的实验结果表明, Adj- Net 超越了对对象关系数据集( VMRD) 的状态, 以及 DUQIM- Net 在 bin clecleglection lection lection diction distranging (VMDDD) 。