Monopoly Deal：有界单边响应游戏的基准环境 (Monopoly Deal: A Benchmark Environment for Bounded One-Sided Response Games)

Card games are widely used to study sequential decision-making under uncertainty, with real-world analogues in negotiation, finance, and cybersecurity. Typically, these games fall into three categories based on the flow of control: strictly-sequential (where players alternate single actions), deterministic-response (where some actions trigger a fixed outcome), and unbounded reciprocal-response (where alternating counterplays are permitted). A less-explored but strategically rich structure exists: the bounded one-sided response. This dynamic occurs when a player's action briefly transfers control to the opponent, who must satisfy a fixed condition through one or more sequential moves before the turn resolves. We term games featuring this mechanism Bounded One-Sided Response Games (BORGs). We introduce a modified version of Monopoly Deal as a benchmark environment that specifically isolates the BORG dynamic, where a Rent action forces the opponent to sequentially choose payment assets. We demonstrate that the gold-standard algorithm, Counterfactual Regret Minimization (CFR), successfully converges on effective strategies for this domain without requiring novel algorithmic extensions. To support efficient, reproducible experimentation, we present a lightweight, full-stack research platform that unifies the environment, a parallelized CFR runtime, and a human-playable web interface, all runnable on a single workstation. This system provides a practical foundation for exploring state representation and policy learning in bounded one-sided response settings. The trained CFR agent and source code are available at https://monopolydeal.ai.

翻译：纸牌游戏被广泛用于研究不确定性下的序列决策，在谈判、金融和网络安全等领域具有现实世界的类比。通常，这些游戏根据控制流可分为三类：严格序列式（玩家交替执行单一行动）、确定性响应式（某些行动触发固定结果）以及无界互惠响应式（允许交替对抗行动）。存在一种较少被探索但策略丰富的结构：有界单边响应。这种动态发生在玩家的行动短暂地将控制权转移给对手时，对手必须在回合结束前通过一个或多个序列移动来满足固定条件。我们将具有这种机制的游戏称为有界单边响应游戏（BORGs）。我们引入了一个修改版的Monopoly Deal作为基准环境，专门隔离了BORG动态，其中租金行动迫使对手顺序选择支付资产。我们证明了黄金标准算法——反事实遗憾最小化（CFR）——无需新颖的算法扩展即可成功收敛于该领域的有效策略。为支持高效、可重复的实验，我们提出了一个轻量级、全栈的研究平台，统一了环境、并行化的CFR运行时以及可人工操作的Web界面，所有组件均可在单台工作站上运行。该系统为在有界单边响应设置中探索状态表示和策略学习提供了实用基础。训练好的CFR代理和源代码可在https://monopolydeal.ai获取。