We investigate the possibility of using animals videos to improve Reinforcement Learning (RL) efficiency and performance. Under a theoretical perspective, we motivate the use of weighted policy optimization for off-policy RL, describe the main challenges when learning from videos and propose solutions. We test our ideas both in offline and online RL and show encouraging results on a series of 2D navigation tasks.
翻译:我们研究使用动物视频提高强化学习效率和绩效的可能性。 从理论角度出发,我们鼓励使用加权政策优化用于非政策性RL,描述从视频中学习的主要挑战并提出解决方案。我们在离线和在线RL测试我们的想法,并在一系列2D导航任务上展示令人鼓舞的结果。