Recent 3D-based manipulation methods either directly predict the grasp pose using 3D neural networks, or solve the grasp pose using similar objects retrieved from shape databases. However, the former faces generalizability challenges when testing with new robot arms or unseen objects; and the latter assumes that similar objects exist in the databases. We hypothesize that recent 3D modeling methods provides a path towards building digital replica of the evaluation scene that affords physical simulation and supports robust manipulation algorithm learning. We propose to reconstruct high-quality meshes from real-world point clouds using state-of-the-art neural surface reconstruction method (the Real2Sim step). Because most simulators take meshes for fast simulation, the reconstructed meshes enable grasp pose labels generation without human efforts. The generated labels can train grasp network that performs robustly in the real evaluation scene (the Sim2Real step). In synthetic and real experiments, we show that the Real2Sim2Real pipeline performs better than baseline grasp networks trained with a large dataset and a grasp sampling method with retrieval-based reconstruction. The benefit of the Real2Sim2Real pipeline comes from 1) decoupling scene modeling and grasp sampling into sub-problems, and 2) both sub-problems can be solved with sufficiently high quality using recent 3D learning algorithms and mesh-based physical simulation techniques.
翻译:最近基于 3D 的操纵方法, 或者直接用 3D 神经网络直接预测 3D 显示 3D 显示 3D 显示, 或者用 从 形状 数据库 中 找到 类似 的 物体 解 显示 。 然而, 前者在 使用 新的 机器人 武器 或 看不见 的 物体 进行 测试时面临 普遍性 挑战 ; 而后者假设 数据库中也有 类似 的 物体 。 我们假设最近的 3D 模型方法 提供了一条路径 : 利用 3D 进行 物理模拟, 支持 强力操纵 算法 学习 。 我们提议利用 最先进的 神经神经表面重建 方法 ( Real2Simim 步骤 ) 重建, 从 多数模拟 将 模具 进行 快速模拟 的 模拟, 重新 重新 的 模具 能够 抓住 标签 生成 的, 并 利用 快速 模具 3Sim2 的 常规 和 解算法, 可以 。