Biological agents have adopted the principle of attention to limit the rate of incoming information from the environment. One question that arises is if an artificial agent has access to only a limited view of its surroundings, how can it control its attention to effectively solve tasks? We propose an approach for learning how to control a hard attention window by maximizing the mutual information between the environment state and the attention location at each step. The agent employs an internal world model to make predictions about its state and focuses attention towards where the predictions may be wrong. Attention is trained jointly with a dynamic memory architecture that stores partial observations and keeps track of the unobserved state. We demonstrate that our approach is effective in predicting the full state from a sequence of partial observations. We also show that the agent's internal representation of the surroundings, a live mental map, can be used for control in two partially observable reinforcement learning tasks. Videos of the trained agent can be found at https://sites.google.com/view/hard-attention-control.
翻译:生物剂采用了关注原则,以限制从环境中接收信息的速度。产生的一个问题是,如果人工剂只能够接触其周围环境的有限视角,它如何控制其注意力以有效解决问题?我们建议了一种方法,通过最大限度地增加环境状态和每个步骤的注意地点之间的相互信息,学习如何控制一个难看的窗口。该剂使用一个内部世界模型,对其状态作出预测,并关注预测可能出错的地方。关注与一个动态的记忆结构共同培训,该结构储存部分观测并跟踪未观测到的状态。我们证明,我们的方法有效地从部分观察的顺序中预测了整个状态。我们还表明,该剂对周围的内部代表,即活的心理地图,可用于控制两个部分观察强化学习任务。受过培训的代理人的视频可以在https://sites.google.com/view/hard-attention-control查阅。