Optimal stopping is the problem of deciding the right time at which to take a particular action in a stochastic system, in order to maximize an expected reward. It has many applications in areas such as finance, healthcare, and statistics. In this paper, we employ deep Reinforcement Learning (RL) to learn optimal stopping policies in two financial engineering applications: namely option pricing, and optimal option exercise. We present for the first time a comprehensive empirical evaluation of the quality of optimal stopping policies identified by three state of the art deep RL algorithms: double deep Q-learning (DDQN), categorical distributional RL (C51), and Implicit Quantile Networks (IQN). In the case of option pricing, our findings indicate that in a theoretical Black-Schole environment, IQN successfully identifies nearly optimal prices. On the other hand, it is slightly outperformed by C51 when confronted to real stock data movements in a put option exercise problem that involves assets from the S&P500 index. More importantly, the C51 algorithm is able to identify an optimal stopping policy that achieves 8% more out-of-sample returns than the best of four natural benchmark policies. We conclude with a discussion of our findings which should pave the way for relevant future research.
翻译:最佳停止是决定在一个精密的RL算法中采取特定行动的适当时间的问题,以便最大限度地获得预期的回报。 它在金融、保健和统计等领域有许多应用。 在本文中,我们运用深强化学习(RL)来学习两种金融工程应用的最佳停止政策:即选项定价和最佳选项练习。 我们第一次对由三种先进的高级RL算法所确定的最佳停止政策的质量进行了全面的实证评估:双深度QQQN(DDQN),绝对分配RL(C51)和隐性量子网络(IQN)。 在选择定价方面,我们的调查结果表明,在理论的黑洞环境中,IQN成功地确定了近乎最佳的价格。另一方面,当我们面对实际的股票数据流动时,C51在选择演算问题涉及S &P500指数中的资产。更重要的是,C51算法能够确定一种最佳制止政策,即实现8%的绝对分配RL(C51)和隐含量网络(IQN)。在选择定价方面,我们的调查结果表明,在理论的黑洞环境中,IQN成功地确定了几乎最佳的价格。我们应该以最佳的方式来完成我们的四项研究。