A person walking along a city street who tries to model all aspects of the world would quickly be overwhelmed by a multitude of shops, cars, and people moving in and out of view, following their own complex and inscrutable dynamics. Exploration and navigation in such an environment is an everyday task, requiring no vast exertion of mental resources. Is it possible to turn this fire hose of sensory information into a minimal latent state which is necessary and sufficient for an agent to successfully act in the world? We formulate this question concretely, and propose the Agent-Controllable State Discovery algorithm (AC-State), which has theoretical guarantees and is practically demonstrated to discover the \textit{minimal controllable latent state} which contains all of the information necessary for controlling the agent, while fully discarding all irrelevant information. This algorithm consists of a multi-step inverse model (predicting actions from distant observations) with an information bottleneck. AC-State enables localization, exploration, and navigation without reward or demonstrations. We demonstrate the discovery of controllable latent state in three domains: localizing a robot arm with distractions (e.g., changing lighting conditions and background), exploring in a maze alongside other agents, and navigating in the Matterport house simulator.
翻译:一个在城市街道上行走、试图模拟世界各个方面的人很快会被众多商店、汽车和人所淹没,他们遵循自己的复杂和不可分的动态,在这种环境中进行探索和导航是一项日常任务,不需要大量的精神资源。这种感官信息的火管可能变成一个最低潜伏状态,对于代理人成功地在世界上采取行动来说,这是必要和足够的。我们具体地提出这一问题,并提议国家可控性发现算法(AC- State),它具有理论保证,并实际地证明能够发现控制代理人的所有必要信息,同时完全抛弃所有无关的信息。这种算法包括一个多步的反向模型(从遥远的观测中做出动作),它是一个信息瓶颈。AC- State能够使一个代理人在世界上顺利地进行定位、探索和导航,而没有奖赏或演示。我们展示了在三个领域可以控制的潜伏状态:将机器人手臂定位为可分散的轨道(e.g.porting the strublock with martristables and inside)。