We consider a dynamic assortment selection problem where a seller has a fixed inventory of $N$ substitutable products and faces an unknown demand that arrives sequentially over $T$ periods. In each period, the seller needs to decide on the assortment of products (of cardinality at most $K$) to offer to the customers. The customer's response follows an unknown multinomial logit model (MNL) with parameters $v$. The goal of the seller is to maximize the total expected revenue given the fixed initial inventory of $N$ products. We give a policy that achieves a regret of $\tilde O\left(K \sqrt{K N T}\left(1 + \frac{\sqrt{v_{\max}}}{q_{\min}}\text{OPT}\right) \right)$ under a mild assumption on the model parameters. In particular, our policy achieves a near-optimal $\tilde O(\sqrt{T})$ regret in the large inventory setting. Our policy builds upon the UCB-based approach for MNL-bandit without inventory constraints in [1] and addresses the inventory constraints through an exponentially sized LP for which we present a tractable approximation while keeping the $\tilde O(\sqrt{T})$ regret bound.
翻译:我们认为,如果卖方拥有固定的可替代产品库存,且面临一个不知名的需求,这些需求依次在美元期间连续出现,就会出现动态的分类选择问题。在每一阶段,卖方需要决定向客户提供的产品(最主要产品,以美元计,以美元计)的种类。客户的答复遵循一个未知的多名登录模型(MNL),并附有参数为美元。卖方的目标是在固定的初始库存为美元产产品的情况下最大限度地增加预期收入总额。我们给出的政策是,在大型库存设置中,实现对美元(K\sqrt=Oleft)的遗憾。我们的政策建立在基于UCB-tleft (1+\\\fsqrt{v ⁇ ⁇ {maxqqq ⁇ {trent{OPT ⁇ right) $(右),但模型参数的假设并不十分温和。特别是,我们的政策在大型库存设置中令人遗憾。我们的政策建立在基于UCB-Pleft O-reck 方法的UC-restal-restal press press pressal 。