Hindsight rationality is an approach to playing multi-agent, general-sum games that prescribes no-regret learning dynamics and describes jointly rational behavior with mediated equilibria. We explore the space of deviation types in extensive-form games (EFGs) and discover powerful types that are efficient to compute in games with moderate lengths. Specifically, we identify four new types of deviations that subsume previously studied types within a broader class we call partial sequence deviations. Integrating the idea of time selection regret minimization into counterfactual regret minimization (CFR), we introduce the extensive-form regret minimization (EFR) algorithm that is hindsight rational for a general and natural class of deviations in EFGs. We provide instantiations and regret bounds for EFR that correspond to each partial sequence deviation type. In addition, we present a thorough empirical analysis of EFR's performance with different deviation types in common benchmark games. As theory suggests, instantiating EFR with stronger deviations leads to behavior that tends to outperform that of weaker deviations.
翻译:事后理性是玩多试剂、普通和普通游戏的一种方法,它规定了不回报学习的动态,并用调解的平衡来共同描述理性行为。我们探索了广泛形式游戏中的偏差类型空间,发现了在中长游戏中有效计算出的强力类型。具体地说,我们确定了四种新的偏差类型,这些类型的偏差包含以前研究过的类别,我们称之为部分序列偏差。将时间选择最小化最小化的想法纳入反事实最小化(CFR),我们引入了广泛形式最小化(EFR)算法,这种算法后视法对于EFGs的一般和自然偏差类别来说是理性的。我们为EFR提供了与每个部分序列偏差类型相对应的即时和遗憾界限。此外,我们提出了对EFR在普通基准游戏中不同偏差类型不同表现的透彻的经验分析。理论表明,瞬间偏差更强烈的反差导致往往优于较弱偏差类型的行为。