Multi-goal reaching is an important problem in reinforcement learning needed to achieve algorithmic generalization. Despite recent advances in this field, current algorithms suffer from three major challenges: high sample complexity, learning only a single way of reaching the goals, and difficulties in solving complex motion planning tasks. In order to address these limitations, we introduce the concept of cumulative accessibility functions, which measure the reachability of a goal from a given state within a specified horizon. We show that these functions obey a recurrence relation, which enables learning from offline interactions. We also prove that optimal cumulative accessibility functions are monotonic in the planning horizon. Additionally, our method can trade off speed and reliability in goal-reaching by suggesting multiple paths to a single goal depending on the provided horizon. We evaluate our approach on a set of multi-goal discrete and continuous control tasks. We show that our method outperforms state-of-the-art goal-reaching algorithms in success rate, sample complexity, and path optimality. Our code is available at https://github.com/layer6ai-labs/CAE, and additional visualizations can be found at https://sites.google.com/view/learning-cae/ .
翻译:尽管最近在这一领域取得了一些进步,但目前的算法仍面临三大挑战:高样本复杂性,只学习一个实现目标的单一方法,以及解决复杂的动态规划任务方面的困难。为了解决这些限制,我们引入了累积无障碍功能的概念,以衡量某个特定国家在一个特定地平线内达到某一目标的可能性。我们显示这些功能符合重现关系,从而能够从离线互动中学习。我们也证明最佳累积无障碍功能在规划视野中是单调的。此外,我们的方法可以通过根据所提供的地平线建议多条路径,实现一个单一目标,从而降低速度和可靠性。我们评估了一套多目标离散和连续控制任务的方法。我们显示,我们的方法在成功率、样本复杂性和路径优化方面,都超过了最先进的目标值算法。我们的代码可以在 https://github.com/lay6ai-labs/CAE中找到,还可以在 https://site/site/glego/calviews/calview.