We study the problem of learning decentralized linear quadratic regulator under a partially nested information constraint, when the system model is unknown a priori. We propose an online learning algorithm that adaptively designs a control policy as new data samples from a single system trajectory become available. Our algorithm design uses a disturbance-feedback representation of state-feedback controllers coupled with online convex optimization with memory and delayed feedback. We show that our online algorithm yields a controller that satisfies the desired information constraint and enjoys an expected regret that scales as $\sqrt{T}$ with the time horizon $T$.
翻译:我们研究在部分嵌套的信息限制下学习分散的线性二次调节器的问题,当系统模型被事先未知时。 我们提议在线学习算法,在单一系统轨迹的新数据样本出现时,适应性地设计控制政策。 我们的算法设计使用了州级反馈控制器的扰动后退代表法,同时以内存和延迟反馈的方式进行在线连接优化。 我们显示我们的在线算法产生一个能够满足所需信息限制的控制器,并享有预期的遗憾,即按时间范围以$\sqrt{T}美元计算。