Multi-agent reinforcement learning (MARL) is a powerful framework for studying emergent behavior in complex agent-based simulations. However, RL agents are often assumed to be rational and behave optimally, which does not fully reflect human behavior. Here, we study more human-like RL agents which incorporate an established model of human-irrationality, the Rational Inattention (RI) model. RI models the cost of cognitive information processing using mutual information. Our RIRL framework generalizes and is more flexible than prior work by allowing for multi-timestep dynamics and information channels with heterogeneous processing costs. We evaluate RIRL in Principal-Agent (specifically manager-employee relations) problem settings of varying complexity where RI models information asymmetry (e.g. it may be costly for the manager to observe certain information about the employees). We show that using RIRL yields a rich spectrum of new equilibrium behaviors that differ from those found under rational assumptions. For instance, some forms of a Principal's inattention can increase Agent welfare due to increased compensation, while other forms of inattention can decrease Agent welfare by encouraging extra work effort. Additionally, new strategies emerge compared to those under rationality assumptions, e.g., Agents are incentivized to increase work effort. These results suggest RIRL is a powerful tool towards building AI agents that can mimic real human behavior.
翻译:多试剂强化学习(MARL)是研究复杂代理模拟中突发行为的有力框架。然而,RL代理商通常被假定为理性且行为最佳,不能充分反映人类行为。在这里,我们研究更像人类的RL代理商,这种代理商包含既定的人类逻辑性模式,即合理无意(RI)模式。RI用相互信息来模拟认知信息处理的成本。我们的RIRL框架笼统地概括了认知信息处理的成本,并且比先前的工作更加灵活,允许多时间步骤动态和信息渠道,同时支付多种加工成本。我们评估了RI代理商(特别是经理-雇员关系)中复杂程度不一的RIRL问题环境,在RI模型中,信息不对称(例如经理观察雇员的某些信息成本可能很高 ) 。我们显示,使用RIRL 生成了与合理假设中发现的新平衡行为的丰富频谱。例如,某些形式的首席代理商的意向会提高代理商福利,因为报酬会增加,而其他类型的代理商则会降低代理商福利。与激励者相比,新的战略会显示,在进行合理的工作上会增加。