As a notable machine learning paradigm, the research efforts in the context of reinforcement learning have certainly progressed leaps and bounds. When compared with reinforcement learning methods with the given system model, the methodology of the reinforcement learning architecture based on the unknown model generally exhibits significantly broader universality and applicability. In this work, a new reinforcement learning architecture is developed and presented without the requirement of any prior knowledge of the system model, which is termed as an approach of a "neural network iterative linear quadratic regulator (NNiLQR)". Depending solely on measurement data, this method yields a completely new non-parametric routine for the establishment of the optimal policy (without the necessity of system modeling) through iterative refinements of the neural network system. Rather importantly, this approach significantly outperforms the classical iterative linear quadratic regulator (iLQR) method in terms of the given objective function because of the innovative utilization of further exploration in the methodology. As clearly indicated from the results attained in two illustrative examples, these significant merits of the NNiLQR method are demonstrated rather evidently.
翻译:与与特定系统模型相比,基于未知模型的强化学习结构方法普遍表现出广泛的普遍性和可适用性;在这项工作中,开发并提出了一个新的强化学习结构,没有事先对系统模型的任何知识要求,该系统模型被称为“网络迭代线性线性二次曲线调节器(NNILQR)”的一种方法。这一方法仅依靠测量数据,为通过神经网络系统的迭接改进(无需系统建模)制定最佳政策提供了全新的非参数性例行程序。更重要的是,这一方法大大超越了典型迭接线性线性二次曲线调节器(iLQR)在特定目标功能方面的功能,因为创新地利用了方法的进一步探索。正如从两个示例中所获得的结果所清楚显示的那样,NNILQR方法的这些重要优点非常明显。