A key limitation in using various modern methods of machine learning in developing feedback control policies is the lack of appropriate methodologies to analyze their long-term dynamics, in terms of making any sort of guarantees (even statistically) about robustness. The central reasons for this are largely due to the so-called curse of dimensionality, combined with the black-box nature of the resulting control policies themselves. This paper aims at the first of these issues. Although the full state space of a system may be quite large in dimensionality, it is a common feature of most model-based control methods that the resulting closed-loop systems demonstrate dominant dynamics that are rapidly driven to some lower-dimensional sub-space within. In this work we argue that the dimensionality of this subspace is captured by tools from fractal geometry, namely various notions of a fractional dimension. We then show that the dimensionality of trajectories induced by model free reinforcement learning agents can be influenced adding a post processing function to the agents reward signal. We verify that the dimensionality reduction is robust to noise being added to the system and show that that the modified agents are more actually more robust to noise and push disturbances in general for the systems we examined.
翻译:在制定反馈控制政策时使用各种现代机器学习方法的关键限制是缺乏分析其长期动态的适当方法,即对稳健性做出任何种类的保证(甚至统计性),其中心原因主要是所谓的维度诅咒,以及由此产生的控制政策本身的黑盒性质。本文件针对这些问题中的第一个问题是。虽然一个系统的完整状态空间在维度方面可能相当大,但大多数基于模型的控制方法的一个共同特征是,由此产生的闭路系统显示的主导动态迅速驱动到某些低维次空间。在这项工作中,我们争辩说,这一次空间的维度是由来自分形几度测量的各种工具所捕捉的。我们然后表明,由模型免费强化学习剂引起的轨迹的维度可以影响给代理器的奖赏信号添加后处理功能。我们核实,由此形成的闭路系统显示的维度减少对于添加到系统中的噪音是强大的,并且表明,在我们检查的系统中,经过改造的代理器实际上更坚固到噪音和总体扰动。