AnyTask：一种用于推进仿真到现实策略学习的自动化任务与数据生成框架 (AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning)

Ran Gong,Xiaohan Zhang,Jinghuan Shang,Maria Vittoria Minniti,Jigarkumar Patel,Valerio Pepe,Riedana Yan,Ahmet Gundogdu,Ivan Kapelyukh,Ali Abbas,Xiaoqiang Yan,Harsh Patel,Laura Herlant,Karl Schmeckpeper

from arxiv, 28 pages, 25 figures. The first four authors contributed equally

Generalist robot learning remains constrained by data: large-scale, diverse, and high-quality interaction data are expensive to collect in the real world. While simulation has become a promising way for scaling up data collection, the related tasks, including simulation task design, task-aware scene generation, expert demonstration synthesis, and sim-to-real transfer, still demand substantial human effort. We present AnyTask, an automated framework that pairs massively parallel GPU simulation with foundation models to design diverse manipulation tasks and synthesize robot data. We introduce three AnyTask agents for generating expert demonstrations aiming to solve as many tasks as possible: 1) ViPR, a novel task and motion planning agent with VLM-in-the-loop Parallel Refinement; 2) ViPR-Eureka, a reinforcement learning agent with generated dense rewards and LLM-guided contact sampling; 3) ViPR-RL, a hybrid planning and learning approach that jointly produces high-quality demonstrations with only sparse rewards. We train behavior cloning policies on generated data, validate them in simulation, and deploy them directly on real robot hardware. The policies generalize to novel object poses, achieving 44% average success across a suite of real-world pick-and-place, drawer opening, contact-rich pushing, and long-horizon manipulation tasks. Our project website is at https://anytask.rai-inst.com .

翻译：通用机器人学习仍受数据制约：大规模、多样化且高质量的交互数据在现实世界中采集成本高昂。虽然仿真已成为扩展数据采集的可行途径，但相关任务——包括仿真任务设计、任务感知场景生成、专家示范合成以及仿真到现实的迁移——仍需耗费大量人力。我们提出AnyTask，这是一个将大规模并行GPU仿真与基础模型相结合的自动化框架，用于设计多样化操作任务并合成机器人数据。我们引入三种AnyTask智能体来生成旨在解决尽可能多任务的专家示范：1) ViPR，一种采用视觉语言模型闭环并行优化的新型任务与运动规划智能体；2) ViPR-Eureka，一种结合生成密集奖励与大型语言模型引导接触采样的强化学习智能体；3) ViPR-RL，一种联合规划与学习的混合方法，仅通过稀疏奖励即可生成高质量示范。我们在生成数据上训练行为克隆策略，在仿真环境中进行验证，并直接部署于真实机器人硬件。这些策略能够泛化至新物体位姿，在真实世界的抓放、抽屉开启、密集接触推动及长时程操作任务套件中实现了44%的平均成功率。项目网站位于 https://anytask.rai-inst.com 。