We propose a novel numerical method for high dimensional Hamilton--Jacobi--Bellman (HJB) type elliptic partial differential equations (PDEs). The HJB PDEs, reformulated as optimal control problems, are tackled by the actor-critic framework inspired by reinforcement learning, based on neural network parametrization of the value and control functions. Within the actor-critic framework, we employ a policy gradient approach to improve the control, while for the value function, we derive a variance reduced least square temporal difference method (VR-LSTD) using stochastic calculus. To numerically discretize the stochastic control problem, we employ an adaptive stepsize scheme to improve the accuracy near the domain boundary. Numerical examples up to $20$ spatial dimensions including the linear quadratic regulators, the stochastic Van der Pol oscillators, and the diffusive Eikonal equations are presented to validate the effectiveness of our proposed method.
翻译:我们为高维的汉密尔顿-Jacobi-Bellman(HJB)类椭圆部分差异方程式(PDEs)提出了一个新的数字方法。HJB PDE(HJB)是作为最佳控制问题重新拟订的,由基于数值和控制功能神经网络分化的强化学习启发的行为者-critic 框架处理。在行为者-critic 框架内,我们采用政策梯度方法改进控制,而对于价值函数,我们使用随机微分微分微分法(VR-LSTD)得出差异最小时间差异法(VR-LSTD)。为了从数字上将随机控制问题分解,我们采用了一个适应性步骤计划,以提高区域边界附近的精确度。数字实例高达20美元,包括线性二次调节器、沙沙变范德尔波尔振动器和diffusive Eikonal等方程式,以验证我们拟议方法的有效性。