A new method for stochastic control based on neural networks and using randomisation of discrete random variables is proposed and applied to optimal stopping time problems. The method models directly the policy and does not need the derivation of a dynamic programming principle nor a backward stochastic differential equation. Unlike continuous optimization where automatic differentiation is used directly, we propose a likelihood ratio method for gradient computation. Numerical tests are done on the pricing of American and swing options. The proposed algorithm succeeds in pricing high dimensional American and swing options in a reasonable computation time, which is not possible with classical algorithms.
翻译:基于神经网络和使用离散随机变数随机随机化的新型随机控制方法被提出来,并应用于最佳停止时间问题。该方法模型直接提出政策模式,不需要动态编程原则的衍生,也不需要后向随机差分方程式。与直接使用自动区分法的连续优化不同,我们提出了梯度计算的可能性比率方法。对美国和摇摆选项的定价进行了数值测试。提议的算法成功地在合理的计算时间内为高维美国人和摇摆选项定价,而传统的算法是不可能的。