The olfactory search POMDP (partially observable Markov decision process) is a sequential decision-making problem designed to mimic the task faced by insects searching for a source of odor in turbulence, and its solutions have applications to sniffer robots. As exact solutions are out of reach, the challenge consists in finding the best possible approximate solutions while keeping the computational cost reasonable. We provide a quantitative benchmarking of a solver based on deep reinforcement learning against traditional POMDP approximate solvers. We show that deep reinforcement learning is a competitive alternative to standard methods, in particular to generate lightweight policies suitable for robots.
翻译:POMDP(部分可见的Markov决策程序)是一个连续决策问题,旨在模仿昆虫在动荡中寻找一种气味来源时所面临的任务,其解决办法适用于嗅探机器人。由于无法找到确切的解决办法,挑战在于找到最佳的近似解决办法,同时保持计算成本的合理性。我们提供了一个基于对传统的POMDP近似溶液进行深入强化学习的求解器的定量基准。我们表明,深强化学习是标准方法的竞争性替代方法,特别是产生适合机器人的轻量级政策。