This paper considers online optimal control with affine constraints on the states and actions under linear dynamics with bounded random disturbances. The system dynamics and constraints are assumed to be known and time-invariant but the convex stage cost functions change adversarially. To solve this problem, we propose Online Gradient Descent with Buffer Zones (OGD-BZ). Theoretically, we show that OGD-BZ with proper parameters can guarantee the system to satisfy all the constraints despite any admissible disturbances. Further, we investigate the policy regret of OGD-BZ, which compares OGD-BZ's performance with the performance of the optimal linear policy in hindsight. We show that OGD-BZ can achieve a policy regret upper bound that is the square root of the horizon length multiplied by some logarithmic terms of the horizon length under proper algorithm parameters.
翻译:本文考虑在线最佳控制,在线性动态下对州和行动的限制与受约束随机扰动的线性动态进行线性控制。 系统动态和限制假定为已知且时间变化性,但锥形阶段成本功能会发生对抗性变化。 为了解决这一问题,我们提议使用缓冲区在线梯层(OGD-BZ)来解决这个问题。 从理论上讲,我们表明,具有适当参数的OGD-BZ可以保证系统满足所有限制,尽管存在任何可允许的干扰。 此外,我们调查OGD-BZ的政策遗憾,它将OGD-BZ的性能与后视最佳线性政策的性能进行比较。 我们显示,OGD-BZ可以实现政策上的遗憾上限,即地平线长的平方根乘以适当算参数下的地平线长度的对数条件。