We propose a reinforcement learning (RL) approach to model optimal exercise strategies for option-type products. We pursue the RL avenue in order to learn the optimal action-value function of the underlying stopping problem. In addition to retrieving the optimal Q-function at any time step, one can also price the contract at inception. We first discuss the standard setting with one exercise right, and later extend this framework to the case of multiple stopping opportunities in the presence of constraints. We propose to approximate the Q-function with a deep neural network, which does not require the specification of basis functions as in the least-squares Monte Carlo framework and is scalable to higher dimensions. We derive a lower bound on the option price obtained from the trained neural network and an upper bound from the dual formulation of the stopping problem, which can also be expressed in terms of the Q-function. Our methodology is illustrated with examples covering the pricing of swing options.
翻译:我们提出一种强化学习(RL)方法,以示范选择型产品的最佳操作战略。我们采用RL途径,以学习基本停止问题的最佳行动价值功能。除了在任何阶段重新获得最佳的Q功能外,还可以在开始时给合同定价。我们首先用一种功能来讨论标准设置,然后将这一框架扩大到在遇到制约的情况下多重停止机会的情况。我们提议将Q功能与深层神经网络相近,不需要规定最低的蒙特卡洛框架的基础功能,并且可以伸缩到更高的层面。我们从经过训练的神经网络获得的选择价格中取一个较低的约束,并从停止问题的双重表述中取一个上限,也可以用“Q”功能来表示。我们的方法用关于周期选项定价的例子来说明。