" 机器学习方面的最后损失:理论和应用 " (On Tilted Losses in Machine Learning: Theory and Applications)

Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization to create parametric distribution shifts. Despite its prevalence in related fields, tilting has not seen widespread use in machine learning. In this work, we aim to bridge this gap by exploring the use of tilting in risk minimization. We study a simple extension to ERM -- tilted empirical risk minimization (TERM) -- which uses exponential tilting to flexibly tune the impact of individual losses. The resulting framework has several useful properties: We show that TERM can increase or decrease the influence of outliers, respectively, to enable fairness or robustness; has variance-reduction properties that can benefit generalization; and can be viewed as a smooth approximation to a superquantile method. Our work makes rigorous connections between TERM and related objectives, such as Value-at-Risk, Conditional Value-at-Risk, and distributionally robust optimization (DRO). We develop batch and stochastic first-order optimization methods for solving TERM, provide convergence guarantees for the solvers, and show that the framework can be efficiently solved relative to common alternatives. Finally, we demonstrate that TERM can be used for a multitude of applications in machine learning, such as enforcing fairness between subgroups, mitigating the effect of outliers, and handling class imbalance. Despite the straightforward modification TERM makes to traditional ERM objectives, we find that the framework can consistently outperform ERM and deliver competitive performance with state-of-the-art, problem-specific approaches.

翻译：指数倾斜是统计、概率、信息理论和优化等领域常用的一种技术,用来创造参数分布变化。尽管在相关领域普遍存在,但倾斜并未在机器学习中广泛使用。在这项工作中,我们的目标是通过探索在风险最小化中使用倾斜来缩小这一差距。我们研究机构风险管理的简单扩展 -- -- 倾斜的经验风险最小化(Term) -- -- 利用指数倾斜来灵活调整个人损失的影响。由此形成的框架有几个有用的属性:我们表明,Term可以增加或降低外部人的影响,从而分别实现公平或稳健;具有减少差异的特性,从而有利于普遍化;并且可以被视为一种超量化方法的平稳近似。我们的工作在术语和相关目标,如价值-风险、条件-价值-风险最小化(Timical-at-risk)和分配性强力优化(DRO)之间有着密切的扩展性倾斜度倾斜度倾斜度。我们开发了分批和随机第一级交付优化方法,用于解决TER系统,为解决者提供趋同保证,并表明框架可以有效地解决与常规-具体化应用的相对的精确度调整,我们所使用的标准,从而可以持续地在标准应用中进行多级的变换。我们所使用的方法中,我们可以超越共同的系统处理,从而避免地研究。