Agent-based methods allow for defining simple rules that generate complex group behaviors. The governing rules of such models are typically set a priori and parameters are tuned from observed behavior trajectories. Instead of making simplifying assumptions across all anticipated scenarios, inverse reinforcement learning provides inference on the short-term (local) rules governing long term behavior policies by using properties of a Markov decision process. We use the computationally efficient linearly-solvable Markov decision process to learn the local rules governing collective movement for a simulation of the self propelled-particle (SPP) model and a data application for a captive guppy population. The estimation of the behavioral decision costs is done in a Bayesian framework with basis function smoothing. We recover the true costs in the SPP simulation and find the guppies value collective movement more than targeted movement toward shelter.
翻译:以代理为基础的方法可以界定产生复杂群体行为的简单规则。 这种模型的管理规则通常先验地设定,参数则根据观察到的行为轨迹加以调整。 反强化学习不是对所有预期的情景进行简化假设,而是通过使用Markov决策过程的属性,对管理长期行为政策的短期(当地)规则进行推论。 我们使用计算效率高的线性可溶马尔科夫决定程序学习关于集体移动的当地规则,以模拟自推进粒子模型和俘虏鳄鱼群体的数据应用。 行为决定成本的估算是在一个基础功能平稳的贝叶斯框架进行的。 我们在SPP模拟中回收真正的成本,发现Guppies集体移动的价值超过有针对性地流向住房。