Visual imitation learning enables reinforcement learning agents to learn to behave from expert visual demonstrations such as videos or image sequences, without explicit, well-defined rewards. Previous research either adopted supervised learning techniques or induce simple and coarse scalar rewards from pixels, neglecting the dense information contained in the image demonstrations. In this work, we propose to measure the expertise of various local regions of image samples, or called \textit{patches}, and recover multi-dimensional \textit{patch rewards} accordingly. Patch reward is a more precise rewarding characterization that serves as a fine-grained expertise measurement and visual explainability tool. Specifically, we present Adversarial Imitation Learning with Patch Rewards (PatchAIL), which employs a patch-based discriminator to measure the expertise of different local parts from given images and provide patch rewards. The patch-based knowledge is also used to regularize the aggregated reward and stabilize the training. We evaluate our method on DeepMind Control Suite and Atari tasks. The experiment results have demonstrated that PatchAIL outperforms baseline methods and provides valuable interpretations for visual demonstrations.
翻译:视觉模拟学习使强化学习代理者能够学习从专业视觉演示中的行为,如视频或图像序列,而没有明确、明确的奖励。 以前的研究要么采用监督的学习技术,要么从像素中诱发简单粗略的卡路里奖赏,忽视图像演示中包含的密集信息。 在这项工作中,我们建议测量图象样本中各个地方区域的专门知识,或称为\ textit{patches},并相应地恢复多维的 textit{patch 奖励}。 补丁奖是一种更精确的有偿定性,它是一种精细的专业知识测量和视觉解释工具。 具体而言,我们向帕奇奖(PatchAIL)展示了Aversarial Limitation Learnings(Patch Rewards)(PatchAIL),它使用一个基于补丁基的鉴别器测量不同地方部分的专门知识,提供补丁的奖励。 补丁知识还用于规范汇总奖赏和稳定培训。 我们评估了我们关于深点控制套和阿塔里任务的方法。 实验结果表明, PatchAIL公司超越了基线方法,并为视觉演示提供了宝贵的解释。