We propose a novel numerical method for high dimensional Hamilton--Jacobi--Bellman (HJB) type elliptic partial differential equations (PDEs). The HJB PDEs, reformulated as optimal control problems, are tackled by the actor-critic framework inspired by reinforcement learning, based on neural network parametrization of the value and control functions. Within the actor-critic framework, we employ a policy gradient approach to improve the control, while for the value function, we derive a variance reduced least-squares temporal difference method using stochastic calculus. To numerically discretize the stochastic control problem, we employ an adaptive step size scheme to improve the accuracy near the domain boundary. Numerical examples up to $20$ spatial dimensions including the linear quadratic regulators, the stochastic Van der Pol oscillators, the diffusive Eikonal equations, and fully nonlinear elliptic PDEs derived from a regulator problem are presented to validate the effectiveness of our proposed method.
翻译:我们为高维的汉密尔顿-Jacobi-Bellman(HJB) 等离子部分差异方程式(PDEs) 提出了一个新的数字方法。HJB PDE(HJB) 被重订为最佳控制问题,由基于数值和控制功能神经网络超光化的强化学习启发的行为者-critic 框架处理。在行为者-critic 框架内,我们采用政策梯度方法改进控制,而对于数值函数,我们使用随机微积分的微积分计算法得出了最小平方差时间差异。为了从数字上将微分控制问题分解,我们采用了一个适应性步骤大小计划,以提高接近域边界的准确性。提出了高达20美元的空间维度的例子,包括线形二次调节器、电离子方程式、diffusive Eikonal等方程式,以及调控器问题产生的完全非线性精度的精度精度精度PDES,以验证我们拟议方法的有效性。