Generalizable manipulation skills, which can be composed to tackle long-horizon and complex daily chores, are one of the cornerstones of Embodied AI. However, existing benchmarks, mostly composed of a suite of simulatable environments, are insufficient to push cutting-edge research works because they lack object-level topological and geometric variations, are not based on fully dynamic simulation, or are short of native support for multiple types of manipulation tasks. To this end, we present ManiSkill2, the next generation of the SAPIEN ManiSkill benchmark, to address critical pain points often encountered by researchers when using benchmarks for generalizable manipulation skills. ManiSkill2 includes 20 manipulation task families with 2000+ object models and 4M+ demonstration frames, which cover stationary/mobile-base, single/dual-arm, and rigid/soft-body manipulation tasks with 2D/3D-input data simulated by fully dynamic engines. It defines a unified interface and evaluation protocol to support a wide range of algorithms (e.g., classic sense-plan-act, RL, IL), visual observations (point cloud, RGBD), and controllers (e.g., action type and parameterization). Moreover, it empowers fast visual input learning algorithms so that a CNN-based policy can collect samples at about 2000 FPS with 1 GPU and 16 processes on a regular workstation. It implements a render server infrastructure to allow sharing rendering resources across all environments, thereby significantly reducing memory usage. We open-source all codes of our benchmark (simulator, environments, and baselines) and host an online challenge open to interdisciplinary researchers.
翻译:通用操作技能可以用来处理长视线和复杂的日常杂务,这是人工智能的奠基石之一。然而,现有基准大多由一组模拟环境组成,不足以推进尖端研究工作,因为它们缺乏目标级地形和几何变异,没有完全动态的模拟,或缺乏对多种类型操纵任务的本地支持。为此,我们提出了ManiSkill2,即下一代SAPIEN Manicle基准,以解决研究人员在使用通用操作技能基准时经常遇到的关键疼痛点。ManiSkill2包含20个操作任务组,有2000+目标模型和4M+演示框架,涵盖固定/移动基础、单/双臂和硬体操作任务,由全动态引擎模拟2D/3D投入数据。它定义了统一的界面和评估协议,以支持广泛的开放算法(例如,典型的智能程序、RL、IL)、视觉观测(点云层、RGBD),以及所有SISSermax的常规操作,从而在16SLServical 服务器上将一个Servical 的模型转化为, 和Serview servical ex ex ex ex ex ex,从而可以将一个关于16 exerviewal ex ex ex ex ex exermacultual exeral exeral