Complex physical tasks entail a sequence of object interactions, each with its own preconditions -- which can be difficult for robotic agents to learn efficiently solely through their own experience. We introduce an approach to discover activity-context priors from in-the-wild egocentric video captured with human worn cameras. For a given object, an activity-context prior represents the set of other compatible objects that are required for activities to succeed (e.g., a knife and cutting board brought together with a tomato are conducive to cutting). We encode our video-based prior as an auxiliary reward function that encourages an agent to bring compatible objects together before attempting an interaction. In this way, our model translates everyday human experience into embodied agent skills. We demonstrate our idea using egocentric EPIC-Kitchens video of people performing unscripted kitchen activities to benefit virtual household robotic agents performing various complex tasks in AI2-iTHOR, significantly accelerating agent learning. Project page: http://vision.cs.utexas.edu/projects/ego-rewards/
翻译:复杂的物理任务需要一系列物体相互作用,每个物体都有自己的先决条件 -- -- 机器人代理人很难仅仅通过自己的经历来有效地学习。我们采用一种方法来发现用人类破损的相机拍摄的、以自我为中心、以自我中心为中心、以人体磨损的录像所拍摄的活动前科。对于一个特定物体,活动前科代表了活动成功所需的其他兼容物体(例如,用刀和剪切板结合番茄有助于切割)。我们把以前以视频为基础的功能编码为辅助性奖励功能,鼓励代理人在尝试互动之前将兼容的物体聚集在一起。这样,我们的模型将日常人类经验转化成以自我为中心的代理技能。我们用以自我为中心的EPIC-Kitchens视频展示我们的想法,即从事无记名厨房活动的人的虚拟家庭机器人代理人在AI2-iTHOR从事各种复杂工作,大大加速代理学习。项目网页:http://vision.cs.utxas.edu/production/ego-rewards/