Conditional value-at-risk (CVaR) and value-at-risk (VaR) are popular tail-risk measures in finance and insurance industries as well as in highly reliable, safety-critical uncertain environments where often the underlying probability distributions are heavy-tailed. We use the multi-armed bandit best-arm identification framework and consider the problem of identifying the arm from amongst finitely many that has the smallest CVaR, VaR, or weighted sum of CVaR and mean. The latter captures the risk-return trade-off common in finance. Our main contribution is an optimal $\delta$-correct algorithm that acts on general arms, including heavy-tailed distributions, and matches the lower bound on the expected number of samples needed, asymptotically (as $\delta$ approaches $0$). The algorithm requires solving a non-convex optimization problem in the space of probability measures, that requires delicate analysis. En-route, we develop new non-asymptotic empirical likelihood-based concentration inequalities for tail-risk measures which are tighter than those for popular truncation-based empirical estimators.
翻译:风险有条件值和风险价值(VaR)是金融和保险业以及高度可靠、安全临界的不确定环境中最受欢迎的尾值风险(VaR)措施,在这种环境中,潜在概率分布往往很重。我们使用多武装强盗最佳武器识别框架,考虑从具有最小CVaR、VaR或CVaR加权总和的有限数目中识别手臂的问题。后者捕捉了金融中常见的风险回报交易。我们的主要贡献是对一般武器(包括重零售分销)采取行动的最优化的美元/delta$更正算法,这种算法与所需样本的预期数量相匹配,即零点(以美元计,接近0.00美元)。算法要求解决概率计量空间的非电离子优化问题,这需要微妙的分析。在轨迹中,我们为比普通估测仪更紧的尾风险措施制定了新的非基于不依赖性的经验性的可能性集中不平等。