This paper considers the distributed online convex optimization problem with time-varying constraints over a network of agents. This is a sequential decision making problem with two sequences of arbitrarily varying convex loss and constraint functions. At each round, each agent selects a decision from the decision set, and then only a portion of the loss function and a coordinate block of the constraint function at this round are privately revealed to this agent. The goal of the network is to minimize network regret and constraint violation. Two distributed online algorithms with full-information and bandit feedback are proposed. Both dynamic and static network regret bounds are analyzed for the proposed algorithms, and network cumulative constraint violation is used to measure constraint violation, which excludes the situation that strictly feasible constraints can compensate the effects of violated constraints. In particular, we show that the proposed algorithms achieve $\mathcal{O}(T^{\max\{\kappa,1-\kappa\}})$ static network regret and $\mathcal{O}(T^{1-\kappa/2})$ network cumulative constraint violation, where $T$ is the total number of rounds and $\kappa\in(0,1)$ is a user-defined trade-off parameter. Moreover, if the loss functions are strongly convex, then the static network regret bound can be reduced to $\mathcal{O}(T^{\kappa})$. Finally, numerical simulations are provided to illustrate the effectiveness of the theoretical results.
翻译:本文审视了在代理人网络上分布的在线 convex优化问题, 其时间差异限制在代理人网络上。 这是一个连续决策问题, 有两个序列任意变化的 convex 损失和约束功能。 在每轮中, 每个代理从决定集中选择一个决定, 然后只有一部分损失函数和本回合制约功能的协调块被私下透露给该代理。 网络的目标是尽量减少网络的遗憾和违反限制。 提出了两个带有完整信息和匪徒反馈的分布式在线算法。 动态和静态网络的遗憾界限都针对拟议的算法进行了分析, 而网络累积限制违规则用于衡量违反约束行为, 从而排除了严格可行的限制可以补偿被违反约束行为影响的情况。 特别是, 我们显示拟议的算法实现了$\ mathcal{O}( Tmax kappappa, 1\\\ kappa ⁇ ) 。 静态网络的遗憾和 $\ mathalcal{O} (T\\\\\\\\\ kappa) 网络累积违反行为, $x a cloadal- lexalendal orstal orma) y 。 然后, 提供 rus a a_ rus a bal_ a_ axxxx) 。