In online convex optimization (OCO), a single loss function sequence is revealed over a time horizon of $T$, and an online algorithm has to choose its action at time $t$, before the loss function at time $t$ is revealed. The goal of the online algorithm is to incur minimal penalty (called $\textit{regret}$ compared to a static optimal action made by an optimal offline algorithm knowing all functions of the sequence in advance. In this paper, we broaden the horizon of OCO, and consider multi-objective OCO, where there are $K$ distinct loss function sequences, and an algorithm has to choose its action at time $t$, before the $K$ loss functions at time $t$ are revealed. To capture the tradeoff between tracking the $K$ different sequences, we consider the $\textit{min-max}$ regret, where the benchmark (optimal offline algorithm) takes a static action across all time slots that minimizes the maximum of the total loss (summed across time slots) incurred by each of the $K$ sequences. An online algorithm is allowed to change its action across time slots, and its {\it min-max} regret is defined as the difference between its $\textit{min-max}$ cost and that of the benchmark. The $\textit{min-max}$ regret is a stringent performance measure and an algorithm with small regret needs to `track' all loss function sequences closely at all times. We consider this $\textit{min-max}$ regret in the i.i.d. input setting where all loss functions are i.i.d. generated from an unknown distribution. For the i.i.d. model we propose a simple algorithm that combines the well-known $\textit{Hedge}$ and online gradient descent (OGD) and show via a remarkably simple proof that its expected $\textit{min-max}$ regret is $O(\sqrt{T \log K})$.
翻译:暂无翻译