This research gauges the ability of deep reinforcement learning (DRL) techniques to assist the optimization and control of fluid mechanical systems. It combines a novel, "degenerate" version of the proximal policy optimization (PPO) algorithm, that trains a neural network in optimizing the system only once per learning episode, and an in-house stabilized finite elements environment implementing the variational multiscale (VMS) method, that computes the numerical reward fed to the neural network. Three prototypical examples of separated flows in two dimensions are used as testbed for developing the methodology, each of which adds a layer of complexity due either to the unsteadiness of the flow solutions, or the sharpness of the objective function, or the dimension of the control parameter space. Relevance is carefully assessed by comparing systematically to reference data obtained by canonical direct and adjoint methods. Beyond adding value to the shallow literature on this subject, these findings establish the potential of single-step PPO for reliable black-box optimization of computational fluid dynamics (CFD) systems, which paves the way for future progress in optimal flow control using this new class of methods.
翻译:这项研究测量了深度强化学习(DRL)技术的能力,以协助优化和控制流体机械系统。它结合了一种新型的“脱精”版本的近似政策优化(PPO)算法,这种算法对神经网络进行每个学习阶段只优化一次系统的培训,以及一个内部稳定的固定元素环境,采用多尺度(VMS)方法,计算向神经网络输入的数值奖励。三个分流两个维度的典型例子被用作制定方法的测试台,其中每一个都增加了一个复杂层,要么是由于流动解决方案不稳定,要么是由于目标功能的清晰度,要么是由于控制参数空间的维度。通过系统地比较从罐体直接和连接方法获得的数据,仔细评估了相关性。除了增加关于这个主题的浅层文献的价值外,这些研究结果还确定了单步的PPO对计算液动态系统进行可靠的黑箱优化的潜力,从而为利用这一新型方法实现最佳流动控制的未来进展铺平了道路。