当链链与 AI 相遇时: 机器学习实现的最佳采矿战略 (When Blockchain Meets AI: Optimal Mining Strategy Achieved By Machine Learning)

This work applies reinforcement learning (RL) from the AI machine learning field to derive an optimal Bitcoin-like blockchain mining strategy without knowing the details of the blockchain network model. Previously, the most profitable mining strategy was believed to be honest mining encoded in the default blockchain protocol. It was shown later that it is possible to gain more mining rewards by deviating from honest mining. In particular, the mining problem can be formulated as a Markov Decision Process (MDP) which can be solved to give the optimal mining strategy. However, solving the mining MDP requires knowing the values of various parameters that characterize the blockchain network model. In real blockchain networks, these parameter values are not easy to obtain and may change over time. This hinders the use of the MDP model-based solution. In this work, we employ RL to dynamically learn a mining strategy with performance approaching that of the optimal mining strategy by observing and interacting with the network. Since the mining MDP problem has a non-linear objective function (rather than linear functions of standard MDP problems), we design a new multi-dimensional RL algorithm to solve the problem. Experimental results indicate that, without knowing the parameter values of the mining MDP model, our multi-dimensional RL mining algorithm can still achieve the optimal performance over time-varying blockchain networks.

翻译：这项工作应用了AI机器学习领域的强化学习(RL),以获得最佳比特币式的链链式采矿战略,而没有了解链式网络模式的细节。以前,最盈利的采矿战略被认为是在默认链式协议中编码的诚实采矿,后来显示,通过脱离诚实采矿,可以获得更多的采矿收益。特别是,采矿问题可以作为Markov决定程序(MDP)来制定,以便提供最佳的采矿战略。然而,解决采矿MDP需要了解作为链式网络模式特点的各种参数的价值。在实际链式网络中,这些参数值不容易获得,而且可能会随着时间的推移而变化。这妨碍了MDP模式解决方案的使用。在这项工作中,我们利用RL来动态地学习采矿战略,通过观察和与网络互动来接近最佳采矿战略的绩效。由于采矿MDP问题具有非线性客观功能(而不是标准MDP问题的线性功能),我们设计了新的多维值RL序列算法,以解决问题,而没有了解MDP的最佳MVI的模型。