We consider the problem of controlling an unknown linear dynamical system under adversarially changing convex costs and full feedback of both the state and cost function. We present the first computationally-efficient algorithm that attains an optimal $\smash{\sqrt{T}}$-regret rate compared to the best stabilizing linear controller in hindsight, while avoiding stringent assumptions on the costs such as strong convexity. Our approach is based on a careful design of non-convex lower confidence bounds for the online costs, and uses a novel technique for computationally-efficient regret minimization of these bounds that leverages their particular non-convex structure.
翻译:我们考虑了在对抗性变化的曲线成本和对状态和成本功能的充分反馈下控制一个未知线性动态系统的问题。 我们提出了第一个计算效率高的算法,该算法与后视最佳稳定线性控制器相比,达到了最佳的 $smash ~sqrt{T ⁇ $$-regret 率,同时避免了对强烈曲线等成本的严格假设。 我们的方法是基于仔细设计非线性较低的在线成本信任圈,并使用一种创新技术来以计算效率高的遗憾最小化这些界限,从而利用其特定的非曲线结构。