In this work we derive and present evolutionary reinforcement learning dynamics in which the agents are irreducibly uncertain about the current state of the environment. We evaluate the dynamics across different classes of partially observable agent-environment systems and find that irreducible environmental uncertainty can lead to better learning outcomes faster, stabilize the learning process and overcome social dilemmas. However, as expected, we do also find that partial observability may cause worse learning outcomes, for example, in the form of a catastrophic limit cycle. Compared to fully observant agents, learning with irreducible environmental uncertainty often requires more exploration and less weight on future rewards to obtain the best learning outcomes. Furthermore, we find a range of dynamical effects induced by partial observability, e.g., a critical slowing down of the learning processes between reward regimes and the separation of the learning dynamics into fast and slow directions. The presented dynamics are a practical tool for researchers in biology, social science and machine learning to systematically investigate the evolutionary effects of environmental uncertainty.
翻译:在这项工作中,我们得出并展示进化强化学习动态,使代理对目前环境状况具有不可逆转的不确定性。我们评估了部分可观测物剂-环境系统不同类别的动态,发现不可减少的环境不确定性能够更快地带来更好的学习结果,稳定学习过程并克服社会困境。然而,正如所预期的那样,我们也发现部分可观察性可能导致更糟糕的学习结果,例如以灾难性极限周期的形式。与完全观察的代理相比,以不可减少的环境不确定性进行学习往往需要更多的探索,对于未来获得最佳学习结果的回报则需要更少的权重。此外,我们发现一系列因部分可观测性而引发的动态效应,例如,奖励制度与学习动态的分化过程严重放缓,而将学习动态分化为快速和缓慢的方向。所呈现的动态是生物学、社会科学和机器学习研究人员系统调查环境不确定性的演进效应的实用工具。