Stochastic games with discounted payoff, introduced by Shapley, model adversarial interactions in stochastic environments where two players try to optimize a discounted sum of rewards. In this model, long-term weights are geometrically attenuated based on the delay in their occurrence. We propose a temporally dual notion -- called past-discounting -- where agents have geometrically decaying memory of the rewards encountered during a play of the game. We study objective functions based on past-discounted weight sequences and examine the corresponding stochastic games with liminf, discounted, and mean payoffs. For objectives specified as the limit inferior of past-discounted reward sequences, we show that positional determinacy fails and that optimal strategies may require unbounded memory. To overcome this obstacle, we study an approximate windowed objective based on the idea of using sliding windows of finite length to examine infinite plays. On the other hand, for objectives specified as the discounted and average limits of past-discounted reward sequences we establish determinacy in mixed stationary strategies in the setting of concurrent stochastic games and show how the values of these games may be computed via reductions to standard discounted and mean-payoff games.
翻译:由Shatley介绍的、 模拟对抗性互动在随机环境中的游戏, 有两个玩家试图优化贴现奖励的折扣。 在这个模型中, 长期的重量是几何性的减慢, 其发生时间的延迟。 我们提出了一个时间性的双重概念 -- -- 叫做过去贴现 -- -- 其代理商对游戏游戏中遇到的奖赏记忆的几何性衰减。 我们根据过去折扣的重量序列研究客观的功能, 并检查相应的悬浮、 贴现和平均报酬的相近性游戏。 对于作为过去折扣奖励序列下限的目标, 我们显示定位确定性失灵, 最佳策略可能需要无限制的记忆。 为了克服这一障碍, 我们研究一个近似窗口化的目标, 其基础是使用有限长度的滑动窗口来检查无限游戏。 另一方面, 为了确定过去折扣和平均的奖赏序列的折扣和平均限度, 我们在设定同时折扣游戏和标准游戏的混合固定策略中设定了确定性。