学习线性夸德里亚高斯系统 (Regret Lower Bounds for Learning Linear Quadratic Gaussian Systems)

This paper presents local minimax regret lower bounds for adaptively controlling linear-quadratic-Gaussian (LQG) systems. We consider smoothly parametrized instances and provide an understanding of when logarithmic regret is impossible which is both instance specific and flexible enough to take problem structure into account. This understanding relies on two key notions: That of local-uninformativeness; when the optimal policy does not provide sufficient excitation for identification of the optimal policy, and yields a degenerate Fisher information matrix; and that of information-regret-boundedness, when the small eigenvalues of a policy-dependent information matrix are boundable in terms of the regret of that policy. Combined with a reduction to Bayesian estimation and application of Van Trees' inequality, these two conditions are sufficient for proving regret bounds on order of magnitude $\sqrt{T}$ in the time horizon, $T$. This method yields lower bounds that exhibit tight dimensional dependencies and scale naturally with control-theoretic problem constants. For instance, we are able to prove that systems operating near marginal stability are fundamentally hard to learn to control. We further show that large classes of systems satisfy these conditions, among them any state-feedback system with both $A$- and $B$-matrices unknown. Most importantly, we also establish that a nontrivial class of partially observable systems, essentially those that are over-actuated, satisfy these conditions, thus providing a $\sqrt{T}$ lower bound also valid for partially observable systems. Finally, we turn to two simple examples which demonstrate that our lower bound captures classical control-theoretic intuition: our lower bounds diverge for systems operating near marginal stability or with large filter gain -- these can be arbitrarily hard to (learn to) control.

翻译：本文展示了适应性控制线性夸脱- Gaussian (LQG) 系统的地方迷你遗憾。我们考虑的是平坦的偏差实例, 并理解了对数的遗憾何时不可能发生, 这既具体又灵活, 足以将问题结构考虑在内。这种理解取决于两个关键概念 : 本地信息规范; 当最佳政策不能为确定最佳政策提供足够的刺激, 并产生一个退化的渔业信息矩阵; 信息过滤器偏差的偏差度, 当依赖政策的低信息矩阵的微弱值在政策的遗憾中具有可受约束性时。我们考虑的是, 并且理解的是, 当对Bayesian估算和应用 Van Trees的不平等性进行减少时, 这两个条件就足以证明对时间范围内的 $- sqrationrational deptal; 当最佳政策无法为确定最佳政策提供足够的刺激时, 并产生一个可显示高度依赖性、自然到控制- 水平常数的系统。例如, 我们能够证明, 最接近直径的系统基本上不易运行的直径直径直立的内, 因此, 这些系统很难学到控制。