We study an online contextual decision-making problem with resource constraints. At each time period, the decision-maker first predicts a reward vector and resource consumption matrix based on a given context vector and then solves a downstream optimization problem to make a decision. The final goal of the decision-maker is to maximize the summation of the reward and the utility from resource consumption, while satisfying the resource constraints. We propose an algorithm that mixes a prediction step based on the "Smart Predict-then-Optimize (SPO)" method with a dual update step based on mirror descent. We prove regret bounds and demonstrate that the overall convergence rate of our method depends on the $\mathcal{O}(T^{-1/2})$ convergence of online mirror descent as well as risk bounds of the surrogate loss function used to learn the prediction model. Our algorithm and regret bounds apply to a general convex feasible region for the resource constraints, including both hard and soft resource constraint cases, and they apply to a wide class of prediction models in contrast to the traditional settings of linear contextual models or finite policy spaces. We also conduct numerical experiments to empirically demonstrate the strength of our proposed SPO-type methods, as compared to traditional prediction-error-only methods, on multi-dimensional knapsack and longest path instances.
翻译:我们研究的是资源制约的在线背景决策问题。 在每一个时间段里,决策者首先预测一个基于特定环境矢量的奖励矢量和资源消费矩阵,然后解决下游优化问题以作出决定。决策者的最终目标是最大限度地权衡奖励和资源消耗的效用,同时满足资源制约。我们建议一种算法,根据“智能预测-当时-最佳化(SPO)”方法将预测步骤与基于镜面下降的双向更新步骤混为一体。我们证明,我们的方法的总体趋同率取决于$\mathcal{O}(T ⁇ -1/2})和美元在线镜底的趋同率以及用于学习预测模型的代谢损失函数的风险界限。我们的算法和遗憾界限适用于资源制约(包括硬性和软性资源制约案例)的通用组合可行区域,它们适用于广泛的预测模型,与传统的线性背景模型或有限的政策空间的设置相对。我们还进行数字实验,以实验方式显示我们所拟议的S-PO-S-S-S-S-PA-S-S-S-S-Sl-S-Sl-Sl-S-Sl-Sl-Sl-Sl-Sl-Sl-S-sl-sl-sl-Sy-sl-sl-sl-slvivial-sl-sl-sl-sl-sl-sl-sl-S-S-sl-S-sl-s-S-S-svivivivivivivivivicl-sl-sl-sl-sl-svivivivivivivivivivivivical-s-s-s-s-svivivivivivivical-svicl-t-s-s-s-s-svicl-svical-vical-t-t-sl-vical-vical-l-l-sl-sl-t-t-svical-l-l-l-vical-l-t-l-l-l-t-t-svic-sl-sl-l-l-k-k-k-k-svic-t-k-k-k-t-