In this paper we prove the efficacy of a simple greedy algorithm for a finite horizon online resource allocation/matching problem, when the corresponding static planning linear program (SPP) exhibits a non-degeneracy condition called the general position gap (GPG). The key intuition that we formalize is that the solution of the reward maximizing SPP is the same as a feasibility LP restricted to the optimal basic activities, and under GPG this solution can be tracked with bounded regret by a greedy algorithm, i.e., without the commonly used technique of periodically resolving the SPP. The goal of the decision maker is to combine resources (from a finite set of resource types) into configurations (from a finite set of feasible configurations) where each configuration is specified by the number of resources consumed of each type and a reward. The resources are further subdivided into three types - offline (whose quantity is known and available at time 0), online-queueable (which arrive online and can be stored in a buffer), and online-nonqueueable (which arrive online and must be matched on arrival or lost). Under GRG we prove that, (i) our greedy algorithm gets bounded any-time regret of $\mathcal{O}(1/\epsilon_0)$ for matching reward ($\epsilon_0$ is a measure of the GPG) when no configuration contains both an online-queueable and an online-nonqueueable resource, and (ii) $\mathcal{O}(\log t)$ expected any-time regret otherwise (we also prove a matching lower bound). By considering the three types of resources, our matching framework encompasses several well-studied problems such as dynamic multi-sided matching, network revenue management, online stochastic packing, and multiclass queueing systems.
翻译:在本文中,当相应的静态规划线性程序(SPP)显示一种非降解性条件,称为一般位置差距(GPG)时,我们证明简单贪婪的算法对有限视野在线资源分配/匹配问题的功效。我们正式确定的关键直觉是,奖励最大化SPP的解决方案与限于最佳基本活动的可行性LP相同,在GPG下,这一解决方案可以通过贪婪算法(即没有定期解决SPP的常用技术)来跟踪。决策者的目标是将资源(来自有限的资源类型)合并成一个非退化性配置(来自有限的可行配置组合的固定组合),其中每种配置都由每种类型所消耗的资源和奖赏来指定。资源进一步细分为三种类型 - 离线(已知数量和时间) 、 在线可储存的(可上网存储的) 和不可连接的(任何在线的,必须匹配或丢失的)。