We investigate improving Monte Carlo Tree Search based solvers for Partially Observable Markov Decision Processes (POMDPs), when applied to adaptive sampling problems. We propose improvements in rollout allocation, the action exploration algorithm, and plan commitment. The first allocates a different number of rollouts depending on how many actions the agent has taken in an episode. We find that rollouts are more valuable after some initial information is gained about the environment. Thus, a linear increase in the number of rollouts, i.e. allocating a fixed number at each step, is not appropriate for adaptive sampling tasks. The second alters which actions the agent chooses to explore when building the planning tree. We find that by using knowledge of the number of rollouts allocated, the agent can more effectively choose actions to explore. The third improvement is in determining how many actions the agent should take from one plan. Typically, an agent will plan to take the first action from the planning tree and then call the planner again from the new state. Using statistical techniques, we show that it is possible to greatly reduce the number of rollouts by increasing the number of actions taken from a single planning tree without affecting the agent's final reward. Finally, we demonstrate experimentally, on simulated and real aquatic data from an underwater robot, that these improvements can be combined, leading to better adaptive sampling. The code for this work is available at https://github.com/uscresl/AdaptiveSamplingPOMCP
翻译:在适应性取样问题时,我们调查如何改进基于蒙特卡洛树的基于蒙特卡洛树搜索的基于部分可观测的Markov决定进程(POMDPs)的解决方案。我们建议改进推广分配、行动探索算法和计划承诺。我们首先根据代理人在一集中采取多少行动而分配不同数量的推广。我们发现,在获得关于环境的一些初步信息后,推出的价值更大。因此,在每步都分配一个固定数字以适应性取样任务时,线性地增加推出的数量是不合适的。代理商选择在建造规划树时探索的第二个变化。我们发现,通过使用有关推广分配数量的知识,代理商可以更有效地选择要探索的行动。第三个改进是确定代理人从一个计划中应该采取多少行动。一般而言,代理商将计划从规划树上采取第一个行动,然后从新状态中再次调用规划员。使用统计技术,我们表明,通过增加从一个单一规划树上采取的行动的数量来大大减少。我们发现,通过使用一个单一规划A的推广数量,可以更有效地选择行动来探索行动。最后,我们从一个模拟的试采样数据,可以证明,我们从一个真正的试采样的模型进行。最后可以证明。