Robot data collected in complex real-world scenarios are often biased due to safety concerns, human preferences, and mission or platform constraints. Consequently, robot learning from such observational data poses great challenges for accurate parameter estimation. We propose a principled causal inference framework for robots to learn the parameters of a stochastic motion model using observational data. Specifically, we leverage the de-biasing functionality of the potential-outcome causal inference framework, the Inverse Propensity Weighting (IPW), and the Doubly Robust (DR) methods, to obtain a better parameter estimation of the robot's stochastic motion model. The IPW is a re-weighting approach to ensure unbiased estimation, and the DR approach further combines any two estimators to strengthen the unbiased result even if one of these estimators is biased. We then develop an approximate policy iteration algorithm using the bias-eliminated estimated state transition function. We validate our framework using both simulation and real-world experiments, and the results have revealed that the proposed causal inference-based navigation and control framework can correctly and efficiently learn the parameters from biased observational data.
翻译:在复杂的现实情景下收集的机器人数据往往由于安全考虑、人类偏好以及任务或平台的限制而偏向。 因此,机器人从这种观测数据中学习给准确的参数估计带来巨大的挑战。 我们提出一个有原则的因果推断框架,让机器人用观察数据学习随机运动模型的参数。 具体地说, 我们利用潜在结果因果推断框架、 反分数加权(IPW) 和 Doubly Robust (DR) 方法的去偏差功能, 以获得对机器人的随机运动模型的更好的参数估计。 IPW 是一种重新加权方法, 以确保不偏倚的估算, 而 DR 方法则进一步将任何两个估计因素结合起来, 以强化不偏差的结果, 即使这些估计者之一有偏差。 然后我们用偏差估计的状态过渡功能来开发一个大概的政策代算法。 我们用模拟和现实世界实验来验证我们的框架, 其结果显示, 拟议的基于推断的导航和控制框架可以正确和有效地从偏差的观察数据中学习参数。