"Monkey see monkey do" is an age-old adage, referring to na\"ive imitation without a deep understanding of a system's underlying mechanics. Indeed, if a demonstrator has access to information unavailable to the imitator (monkey), such as a different set of sensors, then no matter how perfectly the imitator models its perceived environment (See), attempting to reproduce the demonstrator's behavior (Do) can lead to poor outcomes. Imitation learning in the presence of a mismatch between demonstrator and imitator has been studied in the literature under the rubric of causal imitation learning (Zhang et al., 2020), but existing solutions are limited to single-stage decision-making. This paper investigates the problem of causal imitation learning in sequential settings, where the imitator must make multiple decisions per episode. We develop a graphical criterion that is necessary and sufficient for determining the feasibility of causal imitation, providing conditions when an imitator can match a demonstrator's performance despite differing capabilities. Finally, we provide an efficient algorithm for determining imitability and corroborate our theory with simulations.
翻译:“Monkey see monkey do”是一个古老的格言,它指的是在对系统基本力学没有深入了解的情况下进行“na\”模仿。 事实上,如果一个示范器能够获得模仿器(monkey)无法获得的信息(monkey),比如不同的传感器,那么无论模仿器模拟器所感知的环境(见)如何完美,试图复制演示器的行为(Do)都会导致不良的结果。在模拟器和仿真器之间出现不匹配的情况下进行模仿学习已经在文献中进行了研究(Zhang等人,2020年),但现有的解决方案仅限于单阶段决策。本文调查了连续环境中的因果模仿学习问题,而模仿器每集必须做出多重决定。我们制定了一个图形标准,对于确定因果模仿的可行性是必要和充分的,为仿真器能够与恶魔的性能相匹配提供了条件,尽管能力不同。最后,我们为确定可接受性提供了一种高效的算法,用模拟来证实我们的理论。