The construction of approximate replication strategies for derivative contracts in incomplete markets is a key problem of financial engineering. Recently Reinforcement Learning algorithms for pricing and hedging under realistic market conditions have attracted significant interest. While financial research mostly focused on variations of $Q$-learning, in Artificial Intelligence Monte Carlo Tree Search is the recognized state-of-the-art method for various planning problems, such as the games of Hex, Chess, Go,... This article introduces Monte Carlo Tree Search as a method to solve the stochastic optimal control problem underlying the pricing and hedging of financial derivatives. As compared to $Q$-learning it combines reinforcement learning with tree search techniques. As a consequence Monte Carlo Tree Search has higher sample efficiency, is less prone to over-fitting to specific market models and generally learns stronger policies faster. In our experiments we find that Monte Carlo Tree Search, being the world-champion in games like Chess and Go, is easily capable of directly maximizing the utility of investor's terminal wealth without an intermediate mathematical theory.
翻译:在不完全市场中为衍生品合同建造近似复制战略是金融工程的一个关键问题。最近,在现实市场条件下,用于定价和套期保值的强化学习算法吸引了极大的兴趣。虽然金融研究主要侧重于Q美元学习的变数,但人工智能蒙特卡洛树搜索是公认的解决各种规划问题的最先进方法,如Hex、Ches、Go等的游戏...这一文章将蒙特卡洛树搜索作为解决金融衍生品定价和套期保值背后的随机最佳控制问题的一种方法。与用Q美元学习相比,它将强化学习与树木搜索技术相结合。因此,蒙特卡洛树搜索的样本效率较高,因此不易过度适应特定市场模式,通常学习更快的政策。在我们的实验中,我们发现蒙特卡洛树搜索是象切斯和Go这样的游戏中的世界杯,很容易在没有中间数学理论的情况下直接最大限度地发挥投资者终极财富的效用。