We study an online joint assortment-inventory optimization problem, in which we assume that the choice behavior of each customer follows the Multinomial Logit (MNL) choice model, and the attraction parameters are unknown a priori. The retailer makes periodic assortment and inventory decisions to dynamically learn from the realized demands about the attraction parameters while maximizing the expected total profit over time. In this paper, we propose a novel algorithm that can effectively balance the exploration and exploitation in the online decision-making of assortment and inventory. Our algorithm builds on a new estimator for the MNL attraction parameters, a novel approach to incentivize exploration by adaptively tuning certain known and unknown parameters, and an optimization oracle to static single-cycle assortment-inventory planning problems with given parameters. We establish a regret upper bound for our algorithm and a lower bound for the online joint assortment-inventory optimization problem, suggesting that our algorithm achieves nearly optimal regret rate, provided that the static optimization oracle is exact. Then we incorporate more practical approximate static optimization oracles into our algorithm, and bound from above the impact of static optimization errors on the regret of our algorithm. At last, we perform numerical studies to demonstrate the effectiveness of our proposed algorithm.
翻译:我们研究了一种在线联合组合-库存优化问题,假设每个顾客的选择行为遵循多项式Logit(MNL)选择模型,并且吸引力参数先前未知。零售商周期性地做出组合和库存决策,以动态学习所实现的需求对吸引力参数,同时在时间上最大化预期总利润。在本文中,我们提出了一种新颖的算法,可以有效地平衡在线组合和库存决策中的探索和开发。我们的算法建立在MNL吸引参数的新估计方法、一种新颖的通过自适应调整某些已知和未知参数来激励探索的方法以及一个优化oracle的基础上,该oracle针对具有给定参数的静态单周期组合-库存规划问题。我们为我们的算法建立了一个后悔上限,并为在线联合组合-库存优化问题建立了一个下限,表明我们的算法实现了接近最优的后悔率,前提是静态优化oracle是精确的。然后,我们将更实际的近似静态优化oracle纳入我们的算法,并从上方限制了静态优化错误对我们算法的后悔的影响。最后,我们进行了数字研究,证明了我们提出的算法的有效性。