In many practical scenarios, a flying insect must search for the source of an emitted cue which is advected by the atmospheric wind. On the macroscopic scales of interest, turbulence tends to mix the cue into patches of relatively high concentration over a background of very low concentration, so that the insect will only detect the cue intermittently and cannot rely on chemotactic strategies which simply climb the concentration gradient. In this work, we cast this search problem in the language of a partially observable Markov decision process (POMDP) and use the Perseus algorithm to compute strategies that are near-optimal with respect to the arrival time. We test the computed strategies on a large two-dimensional grid, present the resulting trajectories and arrival time statistics, and compare these to the corresponding results for several heuristic strategies, including (space-aware) infotaxis, Thompson sampling, and QMDP. We find that the near-optimal policy found by our implementation of Perseus outperforms all heuristics we test by several measures. We use the near-optimal policy to study how the search difficulty depends on the starting location. We discuss additionally the choice of initial belief and the robustness of the policies to changes in the environment. Finally, we present a detailed and pedagogical discussion about the implementation of the Perseus algorithm, including the benefits -- and pitfalls -- of employing a reward shaping function.
翻译:在许多实际情况下, 飞虫必须寻找被大气风所吸引的导线源。 在宏观利益尺度上, 波动往往将导线混入高度集中于极低集中背景下、 从而昆虫只能间歇地探测导线, 并且不能依赖仅仅攀升浓度梯度的化学战略。 在这项工作中, 我们用部分可见的马可夫决策程序( POMDP) 的语言来解释这一搜索问题, 并使用 Perseus 算法来计算在到达时间上接近最佳的战略。 我们用大型二维电网测试计算出来的策略, 展示由此产生的轨迹和到达时间统计, 并将它们与若干超常战略的相应结果进行比较, 包括( 空间觉察) 肠胃、 汤普森取样和 QMDP。 我们发现, 我们执行 Perseus 的定型的定型的定值所发现的接近最佳的政策, 超越了我们通过几种措施测试的所有超常数的策略。 我们使用接近最佳的政策来测试, 我们使用近最佳的二维的电网格网格, 展示由此产生的轨策略, 以及最终的搜索环境环境的难度是如何决定。 我们如何选择, 。