This paper considers the distributed online convex optimization problem with time-varying constraints over a network of agents. This is a sequential decision making problem with two sequences of arbitrarily varying convex loss and constraint functions. At each round, each agent selects a decision from the decision set, and then only a portion of the loss function and a coordinate block of the constraint function at this round are privately revealed to this agent. The goal of the network is to minimize the network-wide loss accumulated over time. Two distributed online algorithms with full-information and bandit feedback are proposed. Both dynamic and static network regret bounds are analyzed for the proposed algorithms, and network cumulative constraint violation is used to measure constraint violation, which excludes the situation that strictly feasible constraints can compensate the effects of violated constraints. In particular, we show that the proposed algorithms achieve $\mathcal{O}(T^{\max\{\kappa,1-\kappa\}})$ static network regret and $\mathcal{O}(T^{1-\kappa/2})$ network cumulative constraint violation, where $T$ is the time horizon and $\kappa\in(0,1)$ is a user-defined trade-off parameter. Moreover, if the loss functions are strongly convex, then the static network regret bound can be reduced to $\mathcal{O}(T^{\kappa})$. Finally, numerical simulations are provided to illustrate the effectiveness of the theoretical results.
翻译:本文审视了在代理人网络上分布的线上 convex优化问题, 其时间变化限制在代理人网络上。 这是一个顺序决策问题, 有两个序列任意变化的 convex 损失和约束功能。 每回合, 每个代理从决定集中选择一个决定, 然后只有一部分损失函数和本回合制约功能的协调块可以私下向代理人披露。 网络的目标是将时间累积的全网络损失最小化。 提出了两个带有完整信息与土匪反馈的分布式在线算法。 对动态和静态网络的遗憾界限都进行了分析, 并且使用网络累积约束违规来衡量违反约束行为, 这排除了严格可行的限制可以补偿被违反的限制效果的情况。 特别是, 我们显示拟议的算法实现了$\mathcal{O} (T ⁇ kappa,1-\ kappappa} 美元静态网络遗憾和$\mathcal{O} (T\\\\\\\\\\\\\ kappappa/2} 网络累积违约, $ $(a) ax$是时间范围, leaxalaxal- train ormax) exal 函数可以提供。 exal- extradetradeal_ axxxxxxxxxxxxxx 。 。 ralalalalxxxxx 。