Many modern machine learning tasks require models with high tail performance, i.e. high performance over the worst-off samples in the dataset. This problem has been widely studied in fields such as algorithmic fairness, class imbalance, and risk-sensitive decision making. A popular approach to maximize the model's tail performance is to minimize the CVaR (Conditional Value at Risk) loss, which computes the average risk over the tails of the loss. However, for classification tasks where models are evaluated by the zero-one loss, we show that if the classifiers are deterministic, then the minimizer of the average zero-one loss also minimizes the CVaR zero-one loss, suggesting that CVaR loss minimization is not helpful without additional assumptions. We circumvent this negative result by minimizing the CVaR loss over randomized classifiers, for which the minimizers of the average zero-one loss and the CVaR zero-one loss are no longer the same, so minimizing the latter can lead to better tail performance. To learn such randomized classifiers, we propose the Boosted CVaR Classification framework which is motivated by a direct relationship between CVaR and a classical boosting algorithm called LPBoost. Based on this framework, we design an algorithm called $\alpha$-AdaLPBoost. We empirically evaluate our proposed algorithm on four benchmark datasets and show that it achieves higher tail performance than deterministic model training methods.
翻译:许多现代机器学习任务要求具有高尾效的模型,即比数据集中最差的样本高性能。这个问题在算法公平性、阶级不平衡和风险敏感性决策等领域得到了广泛的研究。一个最大限度地使模型尾性能最大化的流行办法是尽量减少CVAR(风险有条件值)损失,计算损失尾部的平均风险。然而,对于模型通过零一损失来评价模型的分类任务,我们表明,如果平均零一损失是确定性的,那么平均零一损失的最小化者也将CVAR零一损失最小化为最小化,这意味着CVAR损失最小化没有额外的假设是没有帮助的。我们通过尽可能减少CVAR损失而不是随机化的分类仪损失,从而避免这一负面结果,因为平均零一损失和CVAR零一损失的平均风险最小化者不再相同,因此后者可以导致更好的尾值业绩。为了了解这种随机化的分类,我们提议在CVAR级平价测试框架上采用CVARSBL的升级模型,这是我们直接设计一个CBLA级标准框架。我们用一个CR标准化的模型来证明我们CBIS标准。