We consider a fair resource allocation problem in the no-regret setting against an unrestricted adversary. The objective is to allocate resources equitably among several agents in an online fashion so that the difference of the aggregate $\alpha$-fair utilities of the agents between an optimal static clairvoyant allocation and that of the online policy grows sub-linearly with time. The problem is challenging due to the non-additive nature of the $\alpha$-fairness function. Previously, it was shown that no online policy can exist for this problem with a sublinear standard regret. In this paper, we propose an efficient online resource allocation policy, called Online Proportional Fair (OPF), that achieves $c_\alpha$-approximate sublinear regret with the approximation factor $c_\alpha=(1-\alpha)^{-(1-\alpha)}\leq 1.445,$ for $0\leq \alpha < 1$. The upper bound to the $c_\alpha$-regret for this problem exhibits a surprising phase transition phenomenon. The regret bound changes from a power-law to a constant at the critical exponent $\alpha=\frac{1}{2}.$ As a corollary, our result also resolves an open problem raised by Even-Dar et al. [2009] on designing an efficient no-regret policy for the online job scheduling problem in certain parameter regimes. The proof of our results introduces new algorithmic and analytical techniques, including greedy estimation of the future gradients for non-additive global reward functions and bootstrapping adaptive regret bounds, which may be of independent interest.
翻译:我们认为,在对一个不受限制的对手的不回报环境下,资源分配是一个公平的问题。 目标是在多个代理商之间以在线方式公平分配资源, 使代理商的美元-公平公用设施总额在最佳静态的单价双价分配和在线政策分配之间的差额随着时间而增加亚线性。 这个问题之所以具有挑战性,是因为美元/ alpha$- 公平功能的不增加性质。 以前, 已经显示, 以亚线性标准为遗憾, 这个问题不可能存在任何在线政策。 在本文中, 我们提出一个高效的在线资源分配政策分配政策, 称为在线比例交易(OPF), 实现美元/ alpha$- 近似线性分线性分流分配之间的差额。 近似于 $/ alpha=% (1-\ alpha)- (1-\ alpha)- leq) 1.445, 美元/ dalpha 的不增额性。 这个问题的上限与美元/ al- regreal- real- realdeal real real real- real revial revial exfervation exfervation 等 。</s>