A constrained version of the online convex optimization (OCO) problem is considered. With slotted time, for each slot, first an action is chosen. Subsequently the loss function and the constraint violation penalty evaluated at the chosen action point is revealed. For each slot, both the loss function as well as the function defining the constraint set is assumed to be smooth and strongly convex. In addition, once an action is chosen, local information about a feasible set within a small neighborhood of the current action is also revealed. An algorithm is allowed to compute at most one gradient at its point of choice given the described feedback to choose the next action. The goal of an algorithm is to simultaneously minimize the dynamic regret (loss incurred compared to the oracle's loss) and the constraint violation penalty (penalty accrued compared to the oracle's penalty). We propose an algorithm that follows projected gradient descent over a suitably chosen set around the current action. We show that both the dynamic regret and the constraint violation is order-wise bounded by the {\it path-length}, the sum of the distances between the consecutive optimal actions. Moreover, we show that the derived bounds are the best possible.
翻译:对在线 convex 优化( OCO) 的限量版本加以考虑 。 时间已排定, 首先选择一个动作 。 随后将披露在选定动作点评估的损失函数和强制违反处罚。 对于每个动作点, 假设损失函数和确定约束的功能都是平滑的和强烈的曲线。 此外, 一旦选择了行动, 当地关于当前动作小区中可行的组合的信息也会被披露 。 算法允许根据描述的反馈在选择点最多计算一个梯度以选择下一个动作 。 算法的目标是同时尽量减少动态遗憾( 与甲骨骼损失相比造成的损失) 和强制违反处罚( 与甲骨骼的处罚相比累积的罚款) 。 我们提议一种算法, 以预测的梯度下降为根据在当前动作中适当选择的设置。 我们显示, 动态遗憾和约束违规的顺序都由 prit path- lex 受命令约束, 是连续最佳动作之间的距离 。 此外, 我们显示, 衍生的界限是最佳可能的 。