Capturing and simulating intelligent adaptive behaviours within spatially explicit individual-based models remains an ongoing challenge for researchers. While an ever-increasing abundance of real-world behavioural data are collected, few approaches exist that can quantify and formalise key individual behaviours and how they change over space and time. Consequently, commonly used agent decision-making frameworks, such as event-condition-action rules, are often required to focus only on a narrow range of behaviours. We argue that these behavioural frameworks often do not reflect real-world scenarios and fail to capture how behaviours can develop in response to stimuli. There has been an increased interest in Machine Learning methods and their potential to simulate intelligent adaptive behaviours in recent years. One method that is beginning to gain traction in this area is Reinforcement Learning (RL). This paper explores how RL can be applied to create emergent agent behaviours using a simple predator-prey Agent-Based Model (ABM). Running a series of simulations, we demonstrate that agents trained using the novel Proximal Policy Optimisation (PPO) algorithm behave in ways that exhibit properties of real-world intelligent adaptive behaviours, such as hiding, evading and foraging.
翻译:在空间清晰的个人模型中获取和模拟智能适应行为仍然是研究人员面临的一个持续挑战。虽然收集了越来越多的真实世界行为数据,但很少有办法能够量化和正规化关键个人行为及其在空间和时间上的变化,因此,通常使用的代理人决策框架,如事件条件规则,往往需要只注重范围狭窄的行为。我们争辩说,这些行为框架往往不反映现实世界情景,不能反映行为如何发展以应对刺激性。近年来,对机器学习方法及其模拟智能适应行为的潜力的兴趣日益提高。开始在这一领域获得牵引力的一种方法是加强学习(RL)。本文探讨了如何运用RL来利用简单的掠食者-预感代理模型(ABM)来创造突发性代理人行为。进行一系列模拟,我们证明利用新颖的Proximimal政策优化(PPO)算法培训的代理人的行为方式展示了真实世界智能适应行为的性质,例如隐藏、蒸发和用于制作。