Deep reinforcement learning (DRL) has attracted much attention as an approach to solve sequential decision making problems without mathematical models of systems or environments. In general, a constraint may be imposed on the decision making. In this study, we consider the optimal decision making problems with constraints to complete temporal high-level tasks in the continuous state-action domain. We describe the constraints using signal temporal logic (STL), which is useful for time sensitive control tasks since it can specify continuous signals within a bounded time interval. To deal with the STL constraints, we introduce an extended constrained Markov decision process (CMDP), which is called a $\tau$-CMDP. We formulate the STL constrained optimal decision making problem as the $\tau$-CMDP and propose a two-phase constrained DRL algorithm using the Lagrangian relaxation method. Through simulations, we also demonstrate the learning performance of the proposed algorithm.
翻译:深度强化学习(DRL)作为解决没有系统或环境数学模型的连续决策问题的一种方法,引起了人们的极大关注。一般而言,可能会对决策施加限制。在本研究中,我们考虑了在连续的州行动领域完成时空高级任务的制约下的最佳决策问题。我们用信号时间逻辑(STL)描述制约,因为信号时间逻辑(STL)对时间敏感控制任务有用,因为它可以在受约束的时限内指定连续信号。为了应对STL的限制,我们引入了延长的限制的Markov决策过程(CMDP),称为$\tau$-CMDP。我们将STL限制最佳决策问题发展为$tau$-CMDP,并提议使用拉格兰吉放松法进行两阶段限制的DRL算法。我们通过模拟,还展示了拟议算法的学习表现。