The imitation learning research community has recently made significant progress towards the goal of enabling artificial agents to imitate behaviors from video demonstrations alone. However, current state-of-the-art approaches developed for this problem exhibit high sample complexity due, in part, to the high-dimensional nature of video observations. Towards addressing this issue, we introduce here a new algorithm called Visual Generative Adversarial Imitation from Observation using a State Observer VGAIfO-SO. At its core, VGAIfO-SO seeks to address sample inefficiency using a novel, self-supervised state observer, which provides estimates of lower-dimensional proprioceptive state representations from high-dimensional images. We show experimentally in several continuous control environments that VGAIfO-SO is more sample efficient than other IfO algorithms at learning from video-only demonstrations and can sometimes even achieve performance close to the Generative Adversarial Imitation from Observation (GAIfO) algorithm that has privileged access to the demonstrator's proprioceptive state information.
翻译:模拟学习研究界最近在实现使人工代理器能够单独模仿视频演示行为的目标方面取得了显著进展。 但是,目前为这一问题开发的先进方法由于视频观测的高度性质而表现出高度的样本复杂性。 为了解决这一问题,我们在此引入了一种新的算法,即使用国家观察员VGAIFO-SO进行观测时的视觉基因反射。在其核心方面,VGAIfO-SO试图使用新颖的、自我监督的国家观察者来解决样本效率低下的问题,该观察者提供高维图像中低维度自控状态显示的估计数。我们在几个连续控制环境中实验显示,VGAIFO-SO比其他连续控制环境中的样本效率更高。如果O在学习只使用视频的演示中学习,有时甚至能够实现接近观察(GAIFO)的“基因反射光”算法,该算法允许接触恶魔的自我感知状态信息。