We study MNL bandits, which is a variant of the traditional multi-armed bandit problem, under risk criteria. Unlike the ordinary expected revenue, risk criteria are more general goals widely used in industries and bussiness. We design algorithms for a broad class of risk criteria, including but not limited to the well-known conditional value-at-risk, Sharpe ratio and entropy risk, and prove that they suffer a near-optimal regret. As a complement, we also conduct experiments with both synthetic and real data to show the empirical performance of our proposed algorithms.
翻译:我们根据风险标准研究MNL土匪,这是传统多武装土匪问题的变种。 与正常的预期收入不同,风险标准是工业和繁忙中广泛使用的更为普遍的目标。 我们为广泛的风险标准设计了算法,包括但不限于众所周知的有条件风险价值、夏普比率和诱导风险,并证明他们遭受了近乎最佳的遗憾。 作为补充,我们还用合成数据和真实数据进行实验,以展示我们提议的算法的经验性表现。