We consider the dynamic assortment optimization problem under the multinomial logit model (MNL) with unknown utility parameters. The main question investigated in this paper is model mis-specification under the $\varepsilon$-contamination model, which is a fundamental model in robust statistics and machine learning. In particular, throughout a selling horizon of length $T$, we assume that customers make purchases according to a well specified underlying multinomial logit choice model in a $(1-\varepsilon)$-fraction of the time periods, and make arbitrary purchasing decisions instead in the remaining $\varepsilon$-fraction of the time periods. In this model, we develop a new robust online assortment optimization policy via an active elimination strategy. We establish both upper and lower bounds on the regret, and show that our policy is optimal up to logarithmic factor in $T$ when the assortment capacity is constant. %% capacity of assortments has a constant upper limit. We further develop a fully adaptive policy that does not require any prior knowledge of the contamination parameter $\varepsilon$. In the case of the existence a sub-optimality gap between optimal and sub-optimal products, we also established gap-dependent logarithmic regret upper bounds and lower bounds in both the known-$\varepsilon$ and unknown-$\varepsilon$ cases. Our simulation study shows that our policy outperforms the existing policies based on upper confidence bounds (UCB) and Thompson sampling.
翻译:我们考虑的是多数字逻辑模型(MNL)下具有未知效用参数的动态度优化问题。本文所调查的主要问题是,在$$varepsilon$- contaminate模型下,模型性地差,这是稳健统计和机器学习的基本模式。特别是,在长长T$的销售期中,我们假设客户按照一个非常明确的多数值逻辑选择模型,在(1-\varepsilon)美元对时间周期的折射中,进行购买,并在剩余时间周期的美元折射中做出任意的购买决定。在这个模型中,我们通过积极消除战略开发一个新的强健的在线质差优化政策。我们在遗憾中建立上下界限,并表明当计算能力不变时,我们的政策符合美元对数值的对数值对数值的对数值的对数值。 以美元为单位的计算能力有一个固定的上限,在时间周期的剩余值中,我们进一步制定完全的调整政策,不需要对污染度参数的上层值进行任何先前了解。 美元- varsiremocial redude rodudeal rodudeal dedual exdude exal excidudeal exdustr exdude exciduction ex ex exciduducal ex exmol ex exmol ex ex ex ex ex ex ex.