Understanding the strategic behavior of miners in a blockchain is of great importance for its proper operation. A common model for mining games considers an infinite time horizon, with players optimizing asymptotic average objectives. Implicitly, this assumes that the asymptotic behaviors are realized at human-scale times, otherwise invalidating current models. We study the mining game utilizing Markov Decision Processes. Our approach allows us to describe the asymptotic behavior of the game in terms of the stationary distribution of the induced Markov chain. We focus on a model with two players under immediate release, assuming two different objectives: the (asymptotic) average reward per turn and the (asymptotic) percentage of obtained blocks. Using tools from Markov chain analysis, we show the existence of a strategy achieving slow mixing times, exponential in the policy parameters. This result emphasizes the imperative need to understand convergence rates in mining games, validating the standard models. Towards this end, we provide upper bounds for the mixing time of certain meaningful classes of strategies. This result yields criteria for establishing that long-term averaged functions are coherent as payoff functions. Moreover, by studying hitting times, we provide a criterion to validate the common simplification of considering finite states models. For both considered objectives functions, we provide explicit formulae depending on the stationary distribution of the underlying Markov chain. In particular, this shows that both mentioned objectives are not equivalent. Finally, we perform a market share case study in a particular regime of the game. More precisely, we show that an strategic player with a sufficiently large processing power can impose negative revenue on honest players.
翻译:了解矿工在一块链条中的战略行为对其正常运行非常重要。 一个共同的采矿游戏模式认为一个无限的时间范围, 球员们可以优化无症状平均目标。 隐含地假定, 无症状行为是在人类规模的时代实现, 否则将当前模式失效。 我们利用Markov 决策程序来研究采矿游戏。 我们的方法让我们能够用诱导的Markov 链条的固定分布来描述游戏的无症状行为。 我们侧重于一个模式, 有两个球员在即时发布, 假设两个不同的目标: 每转一个( 暂时) 平均奖赏游戏的平均权力, 以及( 暂时) 获得的区的百分比。 我们使用Markov 链分析的工具来显示无症状行为在人类规模上已经实现的缓慢混合时间, 在政策参数上指数上指数化。 因此, 有必要理解采矿游戏的趋同率, 验证标准模式。 到此目的, 我们为某些有意义的策略的混合时间提供了上限。 由此得出了确定长期平均功能的标准: 每转一个( ) 每转一个( 平价) 平均) 函数是一致作为报酬的功能。 最后的计算, 我们用一个固定的标准, 显示一个固定的游戏的游戏的功能, 显示一个固定的游戏员。