In many multi-player interactions, players incur strictly positive costs each time they execute actions e.g. 'menu costs' or transaction costs in financial systems. Since acting at each available opportunity would accumulate prohibitively large costs, the resulting decision problem is one in which players must make strategic decisions about when to execute actions in addition to their choice of action. This paper analyses a discrete-time stochastic game (SG) in which players face minimally bounded positive costs for each action and influence the system using impulse controls. We prove SGs of two-sided impulse control have a unique value and characterise the saddle point equilibrium in which the players execute actions at strategically chosen times in accordance with Markovian strategies. We prove the game respects a dynamic programming principle and that the Markov perfect equilibrium can be computed as a limit point of a sequence of Bellman operations. We then introduce a new Q-learning variant which we show converges almost surely to the value of the game enabling solutions to be extracted in unknown settings. Lastly, we extend our results to settings with budgetory constraints.
翻译:暂无翻译