Human learning and intelligence work differently from the supervised pattern recognition approach adopted in most deep learning architectures. Humans seem to learn rich representations by exploration and imitation, build causal models of the world, and use both to flexibly solve new tasks. We suggest a simple but effective unsupervised model which develops such characteristics. The agent learns to represent the dynamical physical properties of its environment by intrinsically motivated exploration, and performs inference on this representation to reach goals. For this, a set of self-organizing maps which represent state-action pairs is combined with a causal model for sequence prediction. The proposed system is evaluated in the cartpole environment. After an initial phase of playful exploration, the agent can execute kinematic simulations of the environment's future, and use those for action planning. We demonstrate its performance on a set of several related, but different one-shot imitation tasks, which the agent flexibly solves in an active inference style.
翻译:人类的学习和智力工作与大多数深层学习结构所采用的监督模式识别方法不同。 人类似乎通过探索和模仿来学习丰富的表达方式,建立世界的因果模型,并利用这两种方法灵活地解决新任务。 我们建议一种简单而有效的、不受监督的模式来发展这些特点。 代理人学会通过内在动机的探索来代表其环境的动态物理特性,并对这一代表方式进行推论以达到目标。 为此,一套代表州- 行动对子的自我组织图与一个因果模型相结合的序列预测。 提议的系统是在马波尔环境中评估的。 经过最初的玩耍探索阶段, 代理人可以对环境的未来进行动态模拟, 并将这些模拟用于行动规划。 我们用一系列相关但不同的一手模仿任务来展示其性能, 代理人在积极的推断风格中灵活地解析。