As a notable machine learning paradigm, the research efforts in the context of reinforcement learning have certainly progressed leaps and bounds. When compared with reinforcement learning methods with the given system model, the methodology of the reinforcement learning architecture based on the unknown model generally exhibits significantly broader universality and applicability. In this work, a new reinforcement learning architecture based on iterative linear quadratic regulator (iLQR) is developed and presented without the requirement of any prior knowledge of the system model, which is termed as an approach of a "neural network iterative linear quadratic regulator (NNiLQR)". Depending solely on measurement data, this method yields a completely new non-parametric routine for the establishment of the optimal policy (without the necessity of system modeling) through iterative refinements of the neural network system. Rather importantly, this approach significantly outperforms the classical iLQR method in terms of the given objective function because of the innovative utilization of further exploration in the methodology. As clearly indicated from the results attained in two illustrative examples, these significant merits of the NNiLQR method are demonstrated rather evidently.
翻译:与与特定系统模型相比,基于未知模型的强化学习结构方法普遍表现出广泛的普遍性和可适用性。在这项工作中,基于迭代线性二次调节器(iLQR)的新的强化学习结构得到开发,并且在不要求事先对系统模型有任何了解的情况下提出,该系统模型被称为“神经网络迭代线性线性二次调节器(NNILQR)”的一种方法。这一方法仅依靠测量数据,通过对神经网络系统进行迭接完善(无需系统建模),为制定最佳政策提供了全新的非参数性常规。更重要的是,这一方法大大超出典型的iLQR方法在特定目标功能方面的功能,因为创新地利用了方法中的进一步探索。正如两个示例所明确显示的那样,NNILQR方法的这些重要优点得到了相当明显的证明。