Typical reinforcement learning (RL) methods show limited applicability for real-world industrial control problems because industrial systems involve various constraints and simultaneously require continuous and discrete control. To overcome these challenges, we devise a novel RL algorithm that enables an agent to handle a highly constrained action space. This algorithm has two main features. First, we devise two distance-based Q-value update schemes, incentive update and penalty update, in a distance-based incentive/penalty update technique to enable the agent to decide discrete and continuous actions in the feasible region and to update the value of these types of actions. Second, we propose a method for defining the penalty cost as a shadow price-weighted penalty. This approach affords two advantages compared to previous methods to efficiently induce the agent to not select an infeasible action. We apply our algorithm to an industrial control problem, microgrid system operation, and the experimental results demonstrate its superiority.
翻译:典型的强化学习(RL)方法显示,由于工业系统涉及各种限制因素,同时需要连续和独立的控制,对现实世界工业控制问题的适用性有限,因为工业系统涉及各种限制,同时需要连续和独立的控制。为了克服这些挑战,我们设计了一种新的RL算法,使代理人能够处理高度限制的行动空间。这种算法有两个主要特点。首先,我们设计了两种基于远程的Q值更新计划、奖励更新和惩罚更新,采用远程激励/惩罚更新技术,使代理人能够决定在可行区域采取离散和连续的行动,并更新这类行动的价值。第二,我们提出了一种将罚款费用定义为影子价格加权惩罚的方法。这种方法与以往的方法相比,具有两种优势,可以有效诱使代理人选择不可行的行动。我们用我们的算法处理工业控制问题,微电网系统操作,实验结果显示了其优势。