This paper considers a two-player game where each player chooses a resource from a finite collection of options without knowing the opponent's choice in the absence of any form of feedback. Each resource brings a random reward. Both players have statistical information regarding the rewards of each resource. Additionally, there exists an information asymmetry where each player has knowledge of the reward realizations of different subsets of the resources. If both players choose the same resource, the reward is divided equally between them, whereas if they choose different resources, each player gains the full reward of the resource. We first implement the iterative best response algorithm to find an $\epsilon$-approximate Nash equilibrium for this game. This method of finding a Nash equilibrium is impractical when players do not trust each other and place no assumptions on the incentives of the opponent. To handle this case, we solve the problem of maximizing the worst-case expected utility of the first player. The solution leads to counter-intuitive insights in certain special cases. To solve the general version of the problem, we develop an efficient algorithmic solution that combines online-convex optimization and the drift-plus penalty technique.
翻译:暂无翻译