Motivated by practical considerations in machine learning for financial decision-making, such as risk-aversion and large action space, we initiate the study of risk-aware linear bandits. Specifically, we consider regret minimization under the mean-variance measure when facing a set of actions whose rewards can be expressed as linear functions of (initially) unknown parameters. Driven by the variance-minimizing G-optimal design, we propose the Risk-Aware Explore-then-Commit (RISE) algorithm and the Risk-Aware Successive Elimination (RISE++) algorithm. Then, we rigorously analyze their regret upper bounds to show that, by leveraging the linear structure, the algorithms can dramatically reduce the regret when compared to existing methods. Finally, we demonstrate the performance of the algorithms by conducting extensive numerical experiments in a synthetic smart order routing setup. Our results show that both RISE and RISE++ can outperform the competing methods, especially in complex decision-making scenarios.
翻译:基于在为金融决策而进行机器学习时的实际考虑,例如风险转换和大型行动空间,我们开始研究有风险意识的线性匪徒。具体地说,我们考虑在中等偏差措施下,在面临一系列行动(其回报可以表现为(最初的)未知参数的线性功能)时,将遗憾最小化。在差异最小化的G-最佳设计驱动下,我们提议风险-软件探索-当时-商业(RISE)算法和风险-软件成功消除(RISE+++)算法。然后,我们严格分析他们的遗憾上限,以表明通过利用线性结构,算法可以极大地减少与现有方法相比的遗憾。最后,我们通过在合成智能顺序排列中进行广泛的数字实验,展示了算法的性表现。我们的结果显示,风险-软件和RIE+++能够超越相互竞争的方法,特别是在复杂的决策假设中。