Object manipulation from 3D visual inputs poses many challenges on building generalizable perception and policy models. However, 3D assets in existing benchmarks mostly lack the diversity of 3D shapes that align with real-world intra-class complexity in topology and geometry. Here we propose SAPIEN Manipulation Skill Benchmark (ManiSkill) to benchmark manipulation skills over diverse objects in a full-physics simulator. 3D assets in ManiSkill include large intra-class topological and geometric variations. Tasks are carefully chosen to cover distinct types of manipulation challenges. Latest progress in 3D vision also makes us believe that we should customize the benchmark so that the challenge is inviting to researchers working on 3D deep learning. To this end, we simulate a moving panoramic camera that returns ego-centric point clouds or RGB-D images. In addition, we would like ManiSkill to serve a broad set of researchers interested in manipulation research. Besides supporting the learning of policies from interactions, we also support learning-from-demonstrations (LfD) methods, by providing a large number of high-quality demonstrations (~36,000 successful trajectories, ~1.5M point cloud/RGB-D frames in total). We provide baselines using 3D deep learning and LfD algorithms. All code of our benchmark (simulator, environment, SDK, and baselines) is open-sourced, and a challenge facing interdisciplinary researchers will be held based on the benchmark.
翻译:从 3D 视觉输入到 3D 对象操作在建立可概括化的观念和政策模型方面提出了许多挑战。然而,现有基准中的 3D 资产大多缺乏与地貌和几何学中真实世界级内部复杂程度相一致的3D 形状的多样性。 我们在这里提议SAPIEN 操纵技能基准(ManiSkill) 来用全物理模拟器对不同对象的操作技能进行基准测试。 ManiSkill 中的 3D 资产包括大型的类内地形和几何差异。 任务经过仔细选择,以涵盖不同的操作挑战类型。 3D 愿景的最新进展还使我们相信,我们应该定制该基准,从而邀请从事3D 深层学习的研究人员来面对挑战。 为此,我们模拟一个移动的全光谱相机,以自我中心点云或RGB-D 图像为回报。 此外,我们想要 Manisk 来为一组对操纵研究感兴趣的研究人员提供服务。 除了支持从互动中学习政策外,我们还支持从不同的操作中学习(LfD) 演示的最新进展方法, 通过提供大量高质量的演示(~3.5 3D 基准中, 标准中提供我们所有的 3D) 成功的基底底底基 。