Robots learning from observations in the real world using inverse reinforcement learning (IRL) may encounter objects or agents in the environment, other than the expert, that cause nuisance observations during the demonstration. These confounding elements are typically removed in fully-controlled environments such as virtual simulations or lab settings. When complete removal is impossible the nuisance observations must be filtered out. However, identifying the source of observations when large amounts of observations are made is difficult. To address this, we present a hierarchical Bayesian model that incorporates both the expert's and the confounding elements' observations thereby explicitly modeling the diverse observations a robot may receive. We extend an existing IRL algorithm originally designed to work under partial occlusion of the expert to consider the diverse observations. In a simulated robotic sorting domain containing both occlusion and confounding elements, we demonstrate the model's effectiveness. In particular, our technique outperforms several other comparative methods, second only to having perfect knowledge of the subject's trajectory.
翻译:机器人在现实世界中用反向强化学习(IRL)从观测中学习时,除了专家外,在环境中可能会遇到在演示期间引起干扰观测的物体或物剂。这些混杂元素通常在完全控制的环境中被清除,例如虚拟模拟或实验室设置。当不可能完全清除时,必须过滤扰动观测结果。然而,在进行大量观测时确定观测来源是困难的。要解决这个问题,我们提出一种高等级的巴伊西亚模型,既包括专家的观测,也包括混杂元素的观测结果,从而明确模拟机器人可能得到的不同观测结果。我们扩展了一种现有的IRL算法,最初设计在专家部分隔离下工作,以考虑各种观测结果。在模拟机器人分类的域中,既包含封闭因素,又包含聚合要素,我们展示模型的有效性。特别是,我们的技术超越了其他几种比较方法,其次于对主体的轨迹的完全了解。