Due to their complex nonlinear dynamics and batch-to-batch variability, batch processes pose a challenge for process control. Due to the absence of accurate models and resulting plant-model mismatch, these problems become harder to address for advanced model-based control strategies. Reinforcement Learning (RL), wherein an agent learns the policy by directly interacting with the environment, offers a potential alternative in this context. RL frameworks with actor-critic architecture have recently become popular for controlling systems where state and action spaces are continuous. It has been shown that an ensemble of actor and critic networks further helps the agent learn better policies due to the enhanced exploration due to simultaneous policy learning. To this end, the current study proposes a stochastic actor-critic RL algorithm, termed Twin Actor Soft Actor-Critic (TASAC), by incorporating an ensemble of actors for learning, in a maximum entropy framework, for batch process control.
翻译:由于其复杂的非线性动态和分批到批批批的变异性,批量过程对流程控制构成挑战。由于缺乏准确的模型和由此产生的植物模型不匹配,这些问题更难为先进的基于模型的控制战略加以解决。加强学习(RL),其中代理商通过与环境直接互动学习政策,在这方面提供了潜在的替代办法。与行为者-批评结构有关的RL框架最近对国家和行动空间连续存在的控制系统越来越受欢迎。已经显示,由于同时学习政策,同时进行强化的探索,一系列行为者和批评网络进一步帮助代理商学习更好的政策。为此,目前的研究提出了一种随机的行为者-行为者-批评RL算法,称为双行为者-Soft Actor-Critic(TASAC),将一组参与者纳入最大酶框架,用于批量过程控制。