垄断交易：有限单边响应博弈的基准环境 (Monopoly Deal: A Benchmark Environment for Bounded One-Sided Response Games)

Card games are widely used to study sequential decision-making under uncertainty, with real-world analogues in negotiation, finance, and cybersecurity. These games typically fall into three categories based on the flow of control: strictly sequential (players alternate single actions), deterministic response (some actions trigger a fixed outcome), and unbounded reciprocal response (alternating counterplays are permitted). A less-explored but strategically rich structure is the bounded one-sided response, where a player's action briefly transfers control to the opponent, who must satisfy a fixed condition through one or more moves before the turn resolves. We term games featuring this mechanism Bounded One-Sided Response Games (BORGs). We introduce a modified version of Monopoly Deal as a benchmark environment that isolates this dynamic, where a Rent action forces the opponent to choose payment assets. The gold-standard algorithm, Counterfactual Regret Minimization (CFR), converges on effective strategies without novel algorithmic extensions. A lightweight full-stack research platform unifies the environment, a parallelized CFR runtime, and a human-playable web interface. The trained CFR agent and source code are available at https://monopolydeal.ai.

翻译：卡牌游戏被广泛用于研究不确定性下的序贯决策，在谈判、金融和网络安全等领域具有现实世界的类比。根据控制流的差异，这些游戏通常分为三类：严格序贯（玩家交替执行单个动作）、确定性响应（某些动作触发固定结果）以及无界互惠响应（允许交替对抗）。一种较少被探索但策略丰富的结构是有限单边响应，其中玩家的动作将控制权短暂转移给对手，对手必须在回合结算前通过一个或多个行动满足固定条件。我们将具有这种机制的游戏称为有限单边响应博弈。我们引入了一个改进版的垄断交易作为基准环境，以隔离这种动态机制，其中租金动作迫使对手选择支付资产。黄金标准算法——反事实遗憾最小化，无需新颖的算法扩展即可收敛于有效策略。一个轻量级全栈研究平台整合了环境、并行化CFR运行时以及可供人类操作的Web界面。训练后的CFR代理和源代码可在https://monopolydeal.ai获取。