We study a canonical problem in adaptive control; design and analysis of policies for minimizing quadratic costs in unknown continuous-time linear dynamical systems. We address important challenges including accuracy of learning the unknown parameters of the underlying stochastic differential equation, as well as full analyses of performance degradation due to sub-optimal actions (i.e., regret). Then, an easy-to-implement algorithm for balancing exploration versus exploitation is proposed, followed by theoretical guarantees showing a square-root of time regret bound. Further, we present tight results for assuring system stability and for specifying fundamental limits for regret. To establish the presented results, multiple novel technical frameworks are developed, which can be of independent interests.
翻译:我们研究了适应性控制方面的一个典型问题;设计并分析了在未知的连续时间线性动态系统中尽量减少二次成本的政策;我们应对了重大挑战,包括准确了解基本随机差异方程式的未知参数,以及全面分析由于次优行动(即遗憾)导致的性能退化。然后,提出了便于执行的勘探与开发平衡算法,随后提出了理论保证,表明时间后悔的平方根。此外,我们提出了确保系统稳定性和明确遗憾基本限度的严格结果。为了确定所提出的结果,制定了多个新的技术框架,这些框架可能具有独立的利益。