Meta-learning is a branch of machine learning which trains neural network models to synthesize a wide variety of data in order to rapidly solve new problems. In process control, many systems have similar and well-understood dynamics, which suggests it is feasible to create a generalizable controller through meta-learning. In this work, we formulate a meta reinforcement learning (meta-RL) control strategy that can be used to tune proportional--integral controllers. Our meta-RL agent has a recurrent structure that accumulates "context" to learn a system's dynamics through a hidden state variable in closed-loop. This architecture enables the agent to automatically adapt to changes in the process dynamics. In tests reported here, the meta-RL agent was trained entirely offline on first order plus time delay systems, and produced excellent results on novel systems drawn from the same distribution of process dynamics used for training. A key design element is the ability to leverage model-based information offline during training in simulated environments while maintaining a model-free policy structure for interacting with novel processes where there is uncertainty regarding the true process dynamics. Meta-learning is a promising approach for constructing sample-efficient intelligent controllers.
翻译:元学习是机器学习的一个分支,它训练神经网络模型,以综合广泛的数据,从而迅速解决新的问题。在工艺控制中,许多系统具有相似和非常理解的动态,这表明通过元学习创建一个通用控制器是可行的。在这项工作中,我们制定了元强化学习(meta-RL)控制战略,可用于调和成比例和整体控制器。我们的元RL代理器有一个经常性结构,通过封闭式循环中隐藏的状态变量积累“连接”来学习系统动态。这一结构使代理器能够自动适应过程动态的变化。在此处报告的测试中,元RL代理器在一线和时间延迟系统中完全脱线培训,并在用于培训的相同流程动态分布中产生了出色的新系统成果。一个关键的设计要素是在模拟环境中的培训中利用基于模型的离线信息,同时保持一个在真实过程动态存在不确定性的情况下与新程序进行互动的无模式政策结构。元学习是建立样本高效智能控制器的一个很有希望的方法。