The construction of approximate replication strategies for pricing and hedging of derivative contracts in incomplete markets is a key problem of financial engineering. Recently Reinforcement Learning algorithms for hedging under realistic market conditions have attracted significant interest. While research in the derivatives area mostly focused on variations of $Q$-learning, in artificial intelligence Monte Carlo Tree Search is the recognized state-of-the-art method for various planning problems, such as the games of Hex, Chess, Go,... This article introduces Monte Carlo Tree Search as a method to solve the stochastic optimal control problem behind the pricing and hedging tasks. As compared to $Q$-learning it combines Reinforcement Learning with tree search techniques. As a consequence Monte Carlo Tree Search has higher sample efficiency, is less prone to over-fitting to specific market models and generally learns stronger policies faster. In our experiments we find that Monte Carlo Tree Search, being the world-champion in games like Chess and Go, is easily capable of maximizing the utility of investor's terminal wealth without setting up an auxiliary mathematical framework.
翻译:在不完善的市场上,为衍生品合同的定价和套期保值建立近似复制战略是金融工程的一个关键问题。最近,在现实市场条件下进行套期保值的强化学习算法吸引了极大的兴趣。虽然衍生品领域的研究主要侧重于Q美元学习的变异,但人工智能蒙特卡洛树搜索是公认的解决各种规划问题的最先进方法,如Hex、Ches、Go等的游戏......这一文章将蒙特卡洛树搜索作为解决定价和套期保值任务背后的随机最佳控制问题的一种方法。与用$Q的学习相比,它将强化学习与树类搜索技术相结合。因此,蒙特卡洛树搜索的样本效率较高,因此不易过度适应特定市场模式,通常学习更快的政策。在我们的实验中,我们发现蒙特卡洛树搜索是象Ches和Go这样的游戏的世界版,很容易在不建立辅助数学框架的情况下最大限度地发挥投资者终极财富的效用。