The advent of deep learning has inspired research into end-to-end learning for a variety of problem domains in robotics. For navigation, the resulting methods may not have the generalization properties desired let alone match the performance of traditional methods. Instead of learning a navigation policy, we explore learning an adaptive policy in the parameter space of an existing navigation module. Having adaptive parameters provides the navigation module with a family of policies that can be dynamically reconfigured based on the local scene structure, and addresses the common assertion in machine learning that engineered solutions are inflexible. Of the methods tested, reinforcement learning (RL) is shown to provide a significant performance boost to a modern navigation method through reduced sensitivity of its success rate to environmental clutter. The outcomes indicate that RL as a meta-policy learner, or dynamic parameter tuner, effectively robustifies algorithms sensitive to external, measurable nuisance factors.
翻译:深层学习的到来激发了对机器人中各种问题领域的端到端学习的研究。 在导航方面,所产生的方法可能没有想要的通用属性,更不用说与传统方法的性能相匹配了。 我们不是学习导航政策,而是在现有的导航模块的参数空间里学习适应性政策。适应性参数为导航模块提供了一套可以根据当地场景结构动态重组的政策,并解决了机器学习中常见的关于工程解决方案不灵活的说法。在所测试的方法中,强化学习(RL)显示通过降低其成功率对环境杂乱的敏感性,为现代导航方法提供了重要的性能增强。结果显示,作为元政策学习者或动态参数调和器,RL有效地强化了对外部、可计量的干扰因素敏感的算法。