在第一人模拟的 3D 环境中, Sparse- Reward 对象互动任务强化学习 (Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a First-person Simulated 3D Environment)

First-person object-interaction tasks in high-fidelity, 3D, simulated environments such as the AI2Thor virtual home-environment pose significant sample-efficiency challenges for reinforcement learning (RL) agents learning from sparse task rewards. To alleviate these challenges, prior work has provided extensive supervision via a combination of reward-shaping, ground-truth object-information, and expert demonstrations. In this work, we show that one can learn object-interaction tasks from scratch without supervision by learning an attentive object-model as an auxiliary task during task learning with an object-centric relational RL agent. Our key insight is that learning an object-model that incorporates object-attention into forward prediction provides a dense learning signal for unsupervised representation learning of both objects and their relationships. This, in turn, enables faster policy learning for an object-centric relational RL agent. We demonstrate our agent by introducing a set of challenging object-interaction tasks in the AI2Thor environment where learning with our attentive object-model is key to strong performance. Specifically, we compare our agent and relational RL agents with alternative auxiliary tasks to a relational RL agent equipped with ground-truth object-information, and show that learning with our object-model best closes the performance gap in terms of both learning speed and maximum success rate. Additionally, we find that incorporating object-attention into an object-model's forward predictions is key to learning representations which capture object-category and object-state.

翻译：在高忠诚度、 3D 、模拟环境( 如 AI2Thor 虚拟家庭- 环境) 中的第一人目标互动任务对强化学习(RL) 代理人从微薄的任务奖励中学习; 为了减轻这些挑战, 先前的工作提供了广泛的监督, 包括奖赏的形状、地面真相天体信息和专家演示。在这项工作中, 我们通过在与一个以对象为中心的关系RL 代理商学习任务期间学习一个关注对象模型作为辅助任务, 来从零开始学习物体互动任务, 而无需监督。我们的关键洞察力是, 学习一个将目标注意纳入前期预测的物体定位对象模型, 提供密集的学习信号, 供在不受监督的情况下学习这两个对象及其关系。反过来, 通过为一个以对象为中心的关系信息代理商提供更快的政策学习。我们通过在 AI2Thor 环境中引入一套具有挑战性的物体互动任务, 与我们关注的物体模型学习对于强大的性能。具体地, 我们将我们的代理人和关联的天体物体代理人与替代性的辅助任务模式任务与一个具有最接近的轨道学习速度学习率。