Humans and animals explore their environment and acquire useful skills even in the absence of clear goals, exhibiting intrinsic motivation. The study of intrinsic motivation in artificial agents is concerned with the following question: what is a good general-purpose objective for an agent? We study this question in dynamic partially-observed environments, and argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model. This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states. We instantiate this approach as a deep reinforcement learning agent equipped with a deep variational Bayes filter. We find that our agent learns to discover, represent, and exercise control of dynamic objects in a variety of partially-observed environments sensed with visual observations without extrinsic reward.
翻译:人类和动物探索其环境并获得有用的技能,即使没有明确的目标,表现出内在动机。对人工剂内在动机的研究涉及以下问题:一个代理人什么是良好的一般目的目标?我们在动态的局部观察环境中研究这一问题,并主张一个紧凑和一般的学习目标是尽量减少该代理人使用潜伏状态空间模型估计的国家访问估计的渗透性。这一目的促使代理人收集其环境信息,以降低不确定性,并控制其环境,以降低未来世界状态的不可预测性。我们将此方法作为深强化学习剂,配备一个深度变化波段过滤器。我们发现,我们的代理人学会在各种局部环境中发现、代表和控制有视觉观察但无极端奖励的动态物体。