Real-world decision-making problems are often partially observable, and many can be formulated as a Partially Observable Markov Decision Process (POMDP). When we apply reinforcement learning (RL) algorithms to the POMDP, reasonable estimation of the hidden states can help solve the problems. Furthermore, explainable decision-making is preferable, considering their application to real-world tasks such as autonomous driving cars. We proposed an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph. Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent's behavior interpretable to humans.
翻译:真实世界中的决策问题通常是部分可观察的,并且许多可以被形式化为部分可观察的马尔可夫决策过程(POMDP)。当我们应用强化学习算法到POMDP中时,合理地估计隐藏状态可以帮助解决问题。此外,可解释的决策是更可取的,考虑到它们在现实世界任务,如自主驾驶汽车中的应用。我们提出了一种RL算法,通过端到端训练估计隐藏状态,并将估计可视化为状态转移图。实验结果表明,所提出的算法可以解决简单的POMDP问题,并且可视化使智能体的行为对人类可解释。