Reinforcement learning has gained significant traction in the field of robotic navigation. However, a persistent challenge is its sample inefficiency, primarily due to the inherent complexities of encouraging exploration. During training, the mobile agent must explore as much as possible to efficiently learn optimal behaviors. We introduce Ada-NAV, a novel adaptive trajectory length scheme designed to enhance the training sample efficiency of reinforcement learning algorithms in robotic navigation tasks. Unlike traditional approaches that treat trajectory length as a fixed hyperparameter, Ada-NAV dynamically adjusts it based on the entropy of the underlying navigation policy. We empirically validate the efficacy of AdaNAV using two popular policy gradient methods: REINFORCE and Proximal Policy Optimization (PPO). We demonstrate through both simulated and real-world robotic experiments that Ada-NAV outperforms conventional methods that employ constant or randomly sampled trajectory lengths. Specifically, for a fixed sample budget, Ada-NAV achieves an 18% increase in navigation success rate, a 20-38% reduction in navigation path length, and a 9.32% decrease in elevation costs. Furthermore, we showcase the versatility of Ada-NAV by integrating it with the Clearpath Husky robot, illustrating its applicability in complex, outdoor environments.
翻译:暂无翻译