Dynamic resource allocation problems are ubiquitous, arising in inventory management, order fulfillment, online advertising, and other applications. We initially focus on one of the simplest models of online resource allocation: the multisecretary problem. In the multisecretary problem, a decision maker sequentially hires up to $B$ out of $T$ candidates, and candidate ability values are drawn i.i.d. from a distribution $F$ on $[0,1]$. First, we investigate fundamental limits on performance as a function of the value distribution under consideration. We quantify performance in terms of regret, defined as the additive loss relative to the best performance achievable in hindsight. We present a novel fundamental regret lower bound scaling of $\Omega(T^{1/2 - 1/2(1 + \beta)})$ for distributions with gaps in their support, with $\beta$ quantifying the mass accumulation of types (values) around these gaps. This lower bound contrasts with the constant and logarithmic regret guarantees shown to be achievable in prior work, under specific assumptions on the value distribution. Second, we introduce a novel algorithmic principle, Conservativeness with respect to Gaps (CwG), which yields near-optimal performance with regret scaling of $\tilde{O}(T^{1/2 - 1/2(1 + \beta)})$ for any distribution in a class parameterized by the mass accumulation parameter $\beta$. We then turn to operationalizing the CwG principle across dynamic resource allocation problems. We study a general and practical algorithm, Repeatedly Act using Multiple Simulations (RAMS), which simulates possible futures to estimate a hindsight-based approximation of the value-to-go function. We establish that this algorithm inherits theoretical performance guarantees of algorithms tailored to the distribution of resource requests, including our CwG-based algorithm, and find that it outperforms them in numerical experiments.
翻译:暂无翻译