TWe establish regret lower bounds for adaptively controlling an unknown linear Gaussian system with quadratic costs. We combine ideas from experiment design, estimation theory and a perturbation bound of certain information matrices to derive regret lower bounds exhibiting scaling on the order of magnitude $\sqrt{T}$ in the time horizon $T$. Our bounds accurately capture the role of control-theoretic parameters and we are able to show that systems that are hard to control are also hard to learn to control; when instantiated to state feedback systems we recover the dimensional dependency of earlier work but with improved scaling with system-theoretic constants such as system costs and Gramians. Furthermore, we extend our results to a class of partially observed systems and demonstrate that systems with poor observability structure also are hard to learn to control.
翻译:我们把实验设计、估算理论和某些信息矩阵的扰动组合等想法结合起来,从中得出遗憾的较低界限,显示在时空范围内对数量值的缩放值 $\ sqrt{T}$T美元。我们的界限准确地抓住了控制理论参数的作用,我们能够显示,难以控制的系统也很难学会控制;当向国家反馈系统转现时,我们恢复了先前工作的维依赖性,但又用系统成本和格拉姆人等系统理论常数改进了规模。此外,我们把结果扩大到了部分观察的系统,并表明,易观察性结构差的系统也很难学会控制。</s>