Despite the potential of active inference for visual-based control, learning the model and the preferences (priors) while interacting with the environment is challenging. Here, we study the performance of a deep active inference (dAIF) agent on OpenAI's car racing benchmark, where there is no access to the car's state. The agent learns to encode the world's state from high-dimensional input through unsupervised representation learning. State inference and control are learned end-to-end by optimizing the expected free energy. Results show that our model achieves comparable performance to deep Q-learning. However, vanilla dAIF does not reach state-of-the-art performance compared to other world model approaches. Hence, we discuss the current model implementation's limitations and potential architectures to overcome them.
翻译:尽管在与环境进行互动时,可能积极推断以视觉为基础的控制,但学习模型和偏好(优先级)是具有挑战性的。在这里,我们研究了OpenAI汽车赛车基准上一个深度积极推论(dAIF)代理(dAI)的性能,该代理无法进入汽车的状态。该代理通过不受监督的代言学习,从高维投入中学习世界状态的编码。通过优化预期的免费能源,国家推论和控制是从最后到最后学到的。结果显示,我们的模型取得了与深Q学习相似的性能。然而,与其它世界模式方法相比,香草 dAIF没有达到最先进的性能。因此,我们讨论了当前模式实施的限制和克服这些限制的潜在结构。