Path planning, the problem of efficiently discovering high-reward trajectories, often requires optimizing a high-dimensional and multimodal reward function. Popular approaches like CEM and CMA-ES greedily focus on promising regions of the search space and may get trapped in local maxima. DOO and VOOT balance exploration and exploitation, but use space partitioning strategies independent of the reward function to be optimized. Recently, LaMCTS empirically learns to partition the search space in a reward-sensitive manner for black-box optimization. In this paper, we develop a novel formal regret analysis for when and why such an adaptive region partitioning scheme works. We also propose a new path planning method PlaLaM which improves the function value estimation within each sub-region, and uses a latent representation of the search space. Empirically, PlaLaM outperforms existing path planning methods in 2D navigation tasks, especially in the presence of difficult-to-escape local optima, and shows benefits when plugged into model-based RL with planning components such as PETS. These gains transfer to highly multimodal real-world tasks, where we outperform strong baselines in compiler phase ordering by up to 245% and in molecular design by up to 0.4 on properties on a 0-1 scale. Code is available at https://github.com/yangkevin2/plalam.
翻译:路径规划是高效发现高回报轨迹的问题,通常需要优化高维和多式奖赏功能。像 CEM 和 CMA-ES 那样的流行方法,如CEM 和 CMA-ES 贪婪地关注有希望的搜索空间区域,可能会被困在本地最大空间中。 DOO 和 VOOOT 平衡探索与开发,但使用独立于奖励功能的空间分割战略以优化。最近, LaMCTS 实验性地学会以奖励敏感的方式分割搜索空间,以便优化黑盒优化。在本文中,我们为这种适应性区域分区分割计划何时和为什么起作用,开发了全新的正式的遗憾分析。我们还提出了新的路径规划方法PLALAM, 改进了每个分区的功能值估算,并使用了搜索空间的潜在代表。 从概念上看, PlaLAMMM 超越了2D 导航任务中现有的路径规划方法, 特别是在困难到景色的地方选择中, 当以模型为基础的RLL 和规划组件如 PETS 等时, 显示好处。这些收益转移到高度多式联运真实世界的任务中, 。这些收益将转移为高度多式的 PalaLAM- am- am- salmax- squal des- des- squal des- des- des- des- des- lab- lax- lax- lax- lax- des- lax- lax- lax- lax- lax- lax- lax-s- lax- lax- lax- lax- lax- lax-s- lax-s- lax- lax- lax- lax- lax- lax- sal- lax- lax-s- lax-s-s-s- lax-s-s- lax-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s- lax-s-s-s-s-s-s-s-s-