Quadruped mobile manipulators offer strong potential for agile loco-manipulation but remain difficult to control and transfer reliably from simulation to reality. Reinforcement learning (RL) shows promise for whole-body control, yet most frameworks are proprietary and hard to reproduce on real hardware. We present an open pipeline for training, benchmarking, and deploying RL-based controllers on the Unitree B1 quadruped with a Z1 arm. The framework unifies sim-to-sim and sim-to-real transfer through ROS, re-implementing a policy trained in Isaac Gym, extending it to MuJoCo via a hardware abstraction layer, and deploying the same controller on physical hardware. Sim-to-sim experiments expose discrepancies between Isaac Gym and MuJoCo contact models that influence policy behavior, while real-world teleoperated object-picking trials show that coordinated whole-body control extends reach and improves manipulation over floating-base baselines. The pipeline provides a transparent, reproducible foundation for developing and analyzing RL-based loco-manipulation controllers and will be released open source to support future research.
翻译:四足移动机械臂在敏捷移动操作方面展现出巨大潜力,但其控制依然困难,且从仿真到现实的可靠迁移仍具挑战。强化学习(RL)为实现全身控制提供了可能,然而现有框架大多为专有系统,难以在真实硬件上复现。本文提出一套用于在搭载Z1机械臂的Unitree B1四足机器人上训练、评估与部署基于强化学习控制器的开源流程。该框架通过ROS统一了仿真到仿真与仿真到现实的迁移流程,重新实现了在Isaac Gym中训练的策略,通过硬件抽象层将其扩展至MuJoCo,并在物理硬件上部署了同一控制器。仿真间对比实验揭示了Isaac Gym与MuJoCo接触模型间的差异如何影响策略行为,而现实世界中的遥操作物体抓取实验表明,协调的全身控制相较于浮动基座基线方案能有效扩展工作空间并提升操作性能。该流程为开发与分析基于强化学习的移动操作控制器提供了透明、可复现的基础框架,并将开源发布以支持未来研究。