Multi-embodiment grasping focuses on developing approaches that exhibit generalist behavior across diverse gripper designs. Existing methods often learn the kinematic structure of the robot implicitly and face challenges due to the difficulty of sourcing the required large-scale data. In this work, we present a data-efficient, flow-based, equivariant grasp synthesis architecture that can handle different gripper types with variable degrees of freedom and successfully exploit the underlying kinematic model, deducing all necessary information solely from the gripper and scene geometry. Unlike previous equivariant grasping methods, we translated all modules from the ground up to JAX and provide a model with batching capabilities over scenes, grippers, and grasps, resulting in smoother learning, improved performance and faster inference time. Our dataset encompasses grippers ranging from humanoid hands to parallel yaw grippers and includes 25,000 scenes and 20 million grasps.
翻译:多形态抓取旨在开发能够适应不同夹爪设计的通用抓取方法。现有方法通常隐式学习机器人的运动学结构,并因难以获取所需的大规模数据而面临挑战。本研究提出了一种数据高效、基于流、等变的抓取合成架构,该架构能够处理具有不同自由度的多种夹爪类型,并成功利用底层运动学模型,仅从夹爪和场景几何中推导出所有必要信息。与以往的等变抓取方法不同,我们将所有模块从头开始移植到JAX框架中,并提供了一个能够对场景、夹爪和抓取动作进行批处理的模型,从而实现了更平滑的学习过程、更高的性能以及更快的推理速度。我们的数据集涵盖了从仿人手到平行偏航夹爪等多种夹爪类型,包含25,000个场景和2,000万次抓取动作。