以成果为导向的不确定性和时空远程意识课程制定目标的强化学习 (Outcome-directed Reinforcement Learning by Uncertainty & Temporal Distance-Aware Curriculum Goal Generation)

Current reinforcement learning (RL) often suffers when solving a challenging exploration problem where the desired outcomes or high rewards are rarely observed. Even though curriculum RL, a framework that solves complex tasks by proposing a sequence of surrogate tasks, shows reasonable results, most of the previous works still have difficulty in proposing curriculum due to the absence of a mechanism for obtaining calibrated guidance to the desired outcome state without any prior domain knowledge. To alleviate it, we propose an uncertainty & temporal distance-aware curriculum goal generation method for the outcome-directed RL via solving a bipartite matching problem. It could not only provide precisely calibrated guidance of the curriculum to the desired outcome states but also bring much better sample efficiency and geometry-agnostic curriculum goal proposal capability compared to previous curriculum RL methods. We demonstrate that our algorithm significantly outperforms these prior methods in a variety of challenging navigation tasks and robotic manipulation tasks in a quantitative and qualitative way.

翻译：目前的强化学习(RL)在解决一个挑战性的探索问题时往往会受到影响,因为人们很少看到预期的结果或高回报。尽管RL课程是一个通过提出一系列替代任务来解决复杂任务的框架,它显示了合理的结果,但以往的大部分工作仍然难以提出课程,因为缺乏一种机制,在没有事先任何领域知识的情况下获得对预期结果状态的校准指导。为了减轻这一困难,我们建议通过解决双方匹配问题,为成果导向的RL提供不确定性和时间远程课程目标生成方法。它不仅能够向预期结果状态提供准确的校准课程指导,而且能够带来比以前课程方法更好的抽样效率和几何测量学课程目标建议能力。我们证明,我们的算法在各种具有挑战性的导航任务以及定量和定性的机器人操纵任务中,大大超越了这些先前的方法。