The difficulty of optimal control problems has classically been characterized in terms of system properties such as minimum eigenvalues of controllability/observability gramians. We revisit these characterizations in the context of the increasing popularity of data-driven techniques like reinforcement learning (RL), and in control settings where input observations are high-dimensional images and transition dynamics are unknown. Specifically, we ask: to what extent are quantifiable control and perceptual difficulty metrics of a task predictive of the performance and sample complexity of data-driven controllers? We modulate two different types of partial observability in a cartpole "stick-balancing" problem -- (i) the height of one visible fixation point on the cartpole, which can be used to tune fundamental limits of performance achievable by any controller, and by (ii) the level of perception noise in the fixation point position inferred from depth or RGB images of the cartpole. In these settings, we empirically study two popular families of controllers: RL and system identification-based $H_\infty$ control, using visually estimated system state. Our results show that the fundamental limits of robust control have corresponding implications for the sample-efficiency and performance of learned perception-based controllers. Visit our project website https://jxu.ai/rl-vs-control-web for more information.
翻译:最佳控制问题的难度典型地体现在系统特性上,例如控制/可观测语法的最小值,控制/可观测语法的最小值等。我们结合数据驱动技术(如强化学习(RL)日益受欢迎以及输入观测为高维图像和过渡动态未知的控制环境,重新审视了这些特征。具体地说,我们问:对数据驱动控制器的性能和样本复杂性进行任务预测的可量化控制和感知困难度的衡量尺度有多大?我们调整了两类不同类别的部分可部分可观测性,即“制片平衡”问题 -- (一) 木板上一个可见固定点的高度,可以用来调节任何控制者所能达到的性能基本限制,以及(二) 从数据控制器的深度或RGB图像推断的固定点位置上的感知噪音程度。在这些环境中,我们用直观估计的系统状态,对两个流行的系统识别 $H ⁇ infty控制对象进行了实验。我们的结果显示,我们对基于系统测测测测测测的系统测算系统,其基本性能/控制网站对稳健健性控制结果的节能性控制影响。