In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio. In such applications, risk plays a crucial role, and a risk-aware performance measure is preferable, so as to capture losses in the case of adverse events. This survey aims to consolidate and summarise the existing research on risk measures, specifically in the context of multi-armed bandits. We review various risk measures of interest, and comment on their properties. Next, we review existing concentration inequalities for various risk measures. Then, we proceed to defining risk-aware bandit problems, We consider algorithms for the regret minimization setting, where the exploration-exploitation trade-off manifests, as well as the best-arm identification setting, which is a pure exploration problem -- both in the context of risk-sensitive measures. We conclude by commenting on persisting challenges and fertile areas for future research.
翻译:在临床试验和金融组合优化等若干应用中,预期价值(或平均奖励)不能令人满意地抓住毒品或组合的优点。在这类应用中,风险起着关键作用,风险意识业绩衡量是可取的,以便捕捉不利事件的损失。这项调查旨在综合和总结关于风险措施的现有研究,特别是在多武装匪徒的情况下;我们审查各种感兴趣的风险计量,并评论其特性。接下来,我们审查各种风险措施的现有集中不平等。然后,我们着手界定风险意识强盗问题,我们考虑尽量减少遗憾的算法,即勘探-开发交易清单,以及最佳武器识别设置,这是一个纯粹的勘探问题,两者都是风险敏感措施。我们最后评论了持续存在的挑战和未来研究的肥沃地区。