Markov跳跃系统的识别和适应控制:样品复杂性和悔恨环 (Identification and Adaptive Control of Markov Jump Systems: Sample Complexity and Regret Bounds)

Learning how to effectively control unknown dynamical systems is crucial for intelligent autonomous systems. This task becomes a significant challenge when the underlying dynamics are changing with time. Motivated by this challenge, this paper considers the problem of controlling an unknown Markov jump linear system (MJS) to optimize a quadratic objective. By taking a model-based perspective, we consider identification-based adaptive control for MJSs. We first provide a system identification algorithm for MJS to learn the dynamics in each mode as well as the Markov transition matrix, underlying the evolution of the mode switches, from a single trajectory of the system states, inputs, and modes. Through mixing-time arguments, sample complexity of this algorithm is shown to be $\mathcal{O}(1/\sqrt{T})$. We then propose an adaptive control scheme that performs system identification together with certainty equivalent control to adapt the controllers in an episodic fashion. Combining our sample complexity results with recent perturbation results for certainty equivalent control, we prove that when the episode lengths are appropriately chosen, the proposed adaptive control scheme achieves $\mathcal{O}(\sqrt{T})$ regret, which can be improved to $\mathcal{O}(polylog(T))$ with partial knowledge of the system. Our proof strategy introduces innovations to handle Markovian jumps and a weaker notion of stability common in MJSs. Our analysis provides insights into system theoretic quantities that affect learning accuracy and control performance. Numerical simulations are presented to further reinforce these insights.

翻译：学习如何有效控制未知动态系统对于智能自主系统至关重要。当基本动态随时间变化时, 此任务将成为一个重大挑战。受此挑战的驱使, 本文会考虑控制未知的Markov 跳线系统( MJS) 的问题, 以优化二次目标。我们从基于模型的角度, 考虑对MJS 进行基于识别的适应控制。我们首先为MJS提供一个系统识别算法, 以学习每个模式的动态以及Markov 过渡矩阵, 从而了解模式开关的演变过程, 从系统状态、投入和模式的单一轨迹变化中。通过混合时间参数参数, 此算法的样本精度复杂性被显示为 $\ mathcal{ O} (1/\ sqrt{T} 优化系统, 将系统识别和确定等量控制来调整控制器的适应。我们的样本复杂度结果与最近对确定性控制结果相结合, 我们证明当选择了插段长度时, 拟议的调控算方案将达到$\macal{ 和我们较弱的系统解释策略。