We consider the problem of controlling an unknown linear dynamical system under a stochastic convex cost and full feedback of both the state and cost function. We present a computationally efficient algorithm that attains an optimal $\sqrt{T}$ regret-rate compared to the best stabilizing linear controller in hindsight. In contrast to previous work, our algorithm is based on the Optimism in the Face of Uncertainty paradigm. This results in a substantially improved computational complexity and a simpler analysis.
翻译:我们考虑的是在一个随机盘旋成本和对状态和成本功能的充分反馈下控制未知线性动态系统的问题。 我们提出了一个计算效率高的算法,与事后观察中最佳稳定线性控制器相比,它达到了最佳的 $\ sqrt{T} $ 的遗憾率。 与以往的工作相比,我们的算法是基于在不确定性范式面前的乐观主义。 这导致计算复杂性大幅提高,分析更简单。