Multi-agent simulations provide a scalable environment for learning policies that interact with rational agents. However, such policies may fail to generalize to the real-world where agents may differ from simulated counterparts due to unmodeled irrationality and misspecified reward functions. We introduce Epsilon-Robust Multi-Agent Simulation (ERMAS), a robust optimization framework for learning AI policies that are robust to such multiagent sim-to-real gaps. While existing notions of multi-agent robustness concern perturbations in the actions of agents, we address a novel robustness objective concerning perturbations in the reward functions of agents. ERMAS provides this robustness by anticipating suboptimal behaviors from other agents, formalized as the worst-case epsilon-equilibrium. We show empirically that ERMAS yields robust policies for repeated bimatrix games and optimal taxation problems in economic simulations. In particular, in the two-level RL problem posed by the AI Economist (Zheng et al., 2020) ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complex spatiotemporal simulations.
翻译:多试剂模拟为学习与理性剂互动的政策提供了一个可伸缩的环境。然而,这种政策可能无法向现实世界推广,因为未经改造的非非理性和错误描述的奖励功能,代理可能不同于模拟的对应方。我们引入了Epsilon-Robust多种代理模拟(ERMAS),这是一个强有力的优化框架,用于学习对多试剂模拟至现实差距具有强大作用的AI政策。虽然现有的多试剂稳健性概念涉及代理方行为中的干扰,但我们解决了有关代理方报酬功能的干扰的新颖稳健性目标。机构风险管理机构通过预测其他代理方的次优行为提供了这种稳健性,作为最坏的例态的epsilon-equilibrium正式化。我们从经验上表明,机构风险管理为反复的双轨游戏和经济模拟中的最佳税收问题提供了强有力的政策。特别是,在AI Econsist(Zheng等人,2020年)提出的两级RL问题中,机构风险管理学会了对代理方风险转换的严格税政策,通过15 %的复杂模拟改善社会福利。