The problem of Reinforcement Learning (RL) in an unknown nonlinear dynamical system is equivalent to the search for an optimal feedback law utilizing the simulations/ rollouts of the unknown dynamical system. Most RL techniques search over a complex global nonlinear feedback parametrization making them suffer from high training times as well as variance. Instead, we advocate searching over a local feedback representation consisting of an open-loop sequence, and an associated optimal linear feedback law completely determined by the open-loop. We show that this alternate approach results in highly efficient training, the answers obtained are repeatable and hence reliable, and the resulting closed performance is superior to global state of the art RL techniques. Finally, if we replan, whenever required, which is feasible due to the fast and reliable local solution, allows us to recover global optimality of the resulting feedback law.
翻译:在一个未知的非线性动态系统中的强化学习(RL)问题相当于利用未知动态系统的模拟/推出来寻找最佳反馈法,大多数RL技术是对复杂的全球非线性反馈的搜索,使其受到高培训时间和高差异的影响。相反,我们主张对当地反馈进行搜索,包括开放循环序列,以及完全由开放循环决定的相关最佳线性反馈法。我们表明,这种替代方法能够带来高效的培训,获得的答案可以重复,因此可靠,因此,因此,封闭式的功能优于艺术RL技术的全球状态。 最后,如果我们在必要时重新规划,由于快速可靠的当地解决方案而可行,使我们能够恢复由此产生的反馈法的全球最佳性。