Robots operating in real-world environments must reason about possible outcomes of stochastic actions and make decisions based on partial observations of the true world state. A major challenge for making accurate and robust action predictions is the problem of confounding, which if left untreated can lead to prediction errors. The partially observable Markov decision process (POMDP) is a widely-used framework to model these stochastic and partially-observable decision-making problems. However, due to a lack of explicit causal semantics, POMDP planning methods are prone to confounding bias and thus in the presence of unobserved confounders may produce underperforming policies. This paper presents a novel causally-informed extension of "anytime regularized determinized sparse partially observable tree" (AR-DESPOT), a modern anytime online POMDP planner, using causal modelling and inference to eliminate errors caused by unmeasured confounder variables. We further propose a method to learn offline the partial parameterisation of the causal model for planning, from ground truth model data. We evaluate our methods on a toy problem with an unobserved confounder and show that the learned causal model is highly accurate, while our planning method is more robust to confounding and produces overall higher performing policies than AR-DESPOT.
翻译:机器人在现实环境中运作,必须对随机动作的可能结果进行推论,并根据部分观察得出的真实环境状态作出决策。制定准确和健壮的动作预测的主要挑战是混淆问题,如果不加处理,将导致预测错误。部分可观测马尔可夫决策过程(POMDP)是一种广泛使用的框架,用于模拟这些随机和部分可观测的决策问题。然而,由于缺乏明确的因果语义, POMDP规划方法容易出现混淆偏差,因此在存在未观察到的混淆因素的情况下可能会产生低效的策略。本文提出了一种新颖的因果知识扩展的 "任意时间正则化确定化稀疏部分可观测树"(AR-DESPOT),这是一种现代化的任意时间在线POMDP规划器,利用因果建模和推理消除因未测量的混淆变量而引起的误差。我们进一步提出了一种方法,从基本的真实模型数据中学习用于规划的因果模型的部分参数化。我们在一个包含未观测到混淆因素的玩具问题上对方法进行了评估,并显示学习的因果模型非常准确,而我们的规划方法更加强大,能够产生整体表现更好的策略,而不受混淆的影响。