This work presents a novel active visuo-tactile based framework for robotic systems to accurately estimate pose of objects in dense cluttered environments. The scene representation is derived using a novel declutter graph (DG) which describes the relationship among objects in the scene for decluttering by leveraging semantic segmentation and grasp affordances networks. The graph formulation allows robots to efficiently declutter the workspace by autonomously selecting the next best object to remove and the optimal action (prehensile or non-prehensile) to perform. Furthermore, we propose a novel translation-invariant Quaternion filter (TIQF) for active vision and active tactile based pose estimation. Both active visual and active tactile points are selected by maximizing the expected information gain. We evaluate our proposed framework on a system with two robots coordinating on randomized scenes of dense cluttered objects and perform ablation studies with static vision and active vision based estimation prior and post decluttering as baselines. Our proposed active visuo-tactile interactive perception framework shows upto 36% improvement in pose accuracy compared to the active vision baseline.
翻译:这项工作为机器人系统提供了一个新的活性相对触动基础框架, 以准确估计在密闭环境中物体的形状。 场景演示是使用新颖的分光图形( DG) 来生成的, 该图形通过利用语义分割和掌握花生网络来描述现场物体的消散关系。 图形配方使机器人能够通过自主选择下一个最佳对象来有效地分解工作空间, 并进行最佳动作( 危险或非危险) 。 此外, 我们提议为主动视觉和以主动动作为基础的表面估计, 建立一个新的翻译异性Quaterion过滤器( TIQF ) 。 通过尽量利用预期的信息收益来选择活跃的视觉和活跃的触动点。 我们用两个机器人对稠密相交的随机场景点进行协调, 并用静态视觉和以积极视觉为基础的估计来进行实验性研究, 在进行基准之前和之后进行这种估计。 我们提议的积极反感动互动观察框架显示, 与积极视觉基线相比, 显示显示的准确度提高了36% 。