The classic cake-cutting problem provides a model for addressing the fair and efficient allocation of a divisible, heterogeneous resource among agents with distinct preferences. Focusing on a standard formulation of cake cutting, in which each agent must receive a contiguous piece of the cake in an offline setting, this work instead focuses on online allocating non-contiguous pieces of cake among agents and establishes algorithmic results for fairness measures. In this regard, we made use of classic adversarial multi-armed bandits to achieve sub-linear Fairness and Revenue Regret at the same time. Adversarial bandits are powerful tools to model the adversarial reinforcement learning environments, that provide strong upper-bounds for regret of learning with just observing one action's reward in each step by applying smart trade-off between exploration and exploitation. This work studies the power of the famous EXP_3 algorithm that is based on exponential wight{-}importance updating probability distribution through time horizon.
翻译:典型的切蛋糕问题为解决在具有不同偏好的代理人之间公平和高效地分配一种可分化、多样化的资源提供了一个模式。 这项工作侧重于一个标准的切蛋糕配方,每个代理人必须在离线环境中获得蛋糕的一块毗连部分,而侧重于在线在代理人之间分配不连的蛋糕配方,并为公平措施确立算法结果。 在这方面,我们同时利用了传统的对抗性多武装强盗实现亚线性和收入累累。 对抗性强盗是模拟对抗性强化学习环境的强大工具,为学习学习提供了强大的上层优势,为学习提供了遗憾,通过在勘探和开采之间应用聪明的权衡,在每一步骤中只观察一个行动奖励。 这项工作研究了著名的EXP_3算法的力量,它基于指数的重量和进口在时间跨度上更新概率分布。