This paper proposes a novel control method for an autonomous wheel loader, enabling time-efficient navigation to an arbitrary goal pose. Unlike prior works that combine high-level trajectory planners with Model Predictive Control (MPC), we directly enhance the planning capabilities of MPC by integrating a cost function derived from Actor-Critic Reinforcement Learning (RL). Specifically, we train an RL agent to solve the pose reaching task in simulation, then incorporate the trained neural network critic as both the stage and terminal cost of an MPC. We show through comprehensive simulations that the resulting MPC inherits the time-efficient behavior of the RL agent, generating trajectories that compare favorably against those found using trajectory optimization. We also deploy our method on a real wheel loader, where we successfully navigate to various goal poses.
翻译:暂无翻译