利用适应性多电网强化学习框架进行强有力的最佳良好控制 (Robust optimal well control using an adaptive multi-grid reinforcement learning framework)

Reinforcement learning (RL) is a promising tool to solve robust optimal well control problems where the model parameters are highly uncertain, and the system is partially observable in practice. However, RL of robust control policies often relies on performing a large number of simulations. This could easily become computationally intractable for cases with computationally intensive simulations. To address this bottleneck, an adaptive multi-grid RL framework is introduced which is inspired by principles of geometric multi-grid methods used in iterative numerical algorithms. RL control policies are initially learned using computationally efficient low fidelity simulations using coarse grid discretization of the underlying partial differential equations (PDEs). Subsequently, the simulation fidelity is increased in an adaptive manner towards the highest fidelity simulation that correspond to finest discretization of the model domain. The proposed framework is demonstrated using a state-of-the-art, model-free policy-based RL algorithm, namely the Proximal Policy Optimisation (PPO) algorithm. Results are shown for two case studies of robust optimal well control problems which are inspired from SPE-10 model 2 benchmark case studies. Prominent gains in the computational efficiency is observed using the proposed framework saving around 60-70% of computational cost of its single fine-grid counterpart.

翻译：强化强化学习(RL)是解决强力最佳控制问题的一个很有希望的工具,模型参数极不确定,系统在实际中可部分观测到。但是,强力控制政策的RL往往依赖进行大量模拟。这很容易成为计算密集模拟案件在计算上难以处理的难题。为解决这一瓶颈问题,引入了一个适应性多格多格RL框架,这一框架的灵感来自迭代数字算法中采用的几何多格多格方法原则。RL控制政策最初是利用利用基础部分差异方程式(PDEs)粗格离散的计算效率低忠诚度模拟来学习的。随后,模拟忠诚度以适应方式提高,以达到与模型领域最佳离散相符的最高忠诚性模拟。为了解决这一瓶颈,采用了一个适应性、无模型的多格多格LL算法原则,即普罗克西米勒政策优化算法(PPPO)算法。根据SPE-10模型基准案例研究(PDE)对稳健最佳控制问题进行的两个案例研究得出了结果。随后,模拟忠于SPE-10模型基准案例研究中,在计算成本框架中观察到了对价。