Despite the significant success at enabling robots with autonomous behaviors makes deep reinforcement learning a promising approach for robotic object search task, the deep reinforcement learning approach severely suffers from the nature sparse reward setting of the task. To tackle this challenge, we present a novel policy learning paradigm for the object search task, based on hierarchical and interpretable modeling with an intrinsic-extrinsic reward setting. More specifically, we explore the environment efficiently through a proxy low-level policy which is driven by the intrinsic rewarding sub-goals. We further learn our hierarchical policy from the efficient exploration experience where we optimize both of our high-level and low-level policies towards the extrinsic rewarding goal to perform the object search task well. Experiments conducted on the House3D environment validate and show that the robot, trained with our model, can perform the object search task in a more optimal and interpretable way.
翻译:尽管在使机器人具有自主行为能力方面取得了巨大成功,使加强后学习成为机器人物体搜索任务的一个很有希望的方法,但深强化学习方法却因这项任务的自然微薄的奖赏设置而严重受损。为了应对这一挑战,我们为物体搜索任务提出了一个全新的政策学习模式,其依据是等级和可解释的模型,并具有内在的极端奖赏设置。更具体地说,我们通过由内在的得益子目标驱动的代用低层次政策,有效探索环境。我们进一步从高效的探索经验中学习我们的等级政策,即我们优化高层次和低层次政策,以实现目标搜索任务的最佳目的。在Hous3D环境上进行的实验验证并表明,以我们模型培训的机器人能够以更优化和可解释的方式完成物体搜索任务。