Properness for supervised losses stipulates that the loss function shapes the learning algorithm towards the true posterior of the data generating distribution. Unfortunately, data in modern machine learning can be corrupted or twisted in many ways. Hence, optimizing a proper loss function on twisted data could perilously lead the learning algorithm towards the twisted posterior, rather than to the desired clean posterior. Many papers cope with specific twists (e.g., label/feature/adversarial noise), but there is a growing need for a unified and actionable understanding atop properness. Our chief theoretical contribution is a generalization of the properness framework with a notion called twist-properness, which delineates loss functions with the ability to "untwist" the twisted posterior into the clean posterior. Notably, we show that a nontrivial extension of a loss function called $\alpha$-loss, which was first introduced in information theory, is twist-proper. We study the twist-proper $\alpha$-loss under a novel boosting algorithm, called PILBoost, and provide formal and experimental results for this algorithm. Our overarching practical conclusion is that the twist-proper $\alpha$-loss outperforms the proper $\log$-loss on several variants of twisted data.
翻译:对受监督损失的正确性规定,损失函数决定了对数据传播的真正后端的学习算法。 不幸的是,现代机器学习中的数据可能在许多方面被腐蚀或扭曲。 因此,在扭曲的数据中优化适当的损失函数可能会危险地引导学习算法走向扭曲的后端,而不是想要的干净的后端。 许多论文都应付了具体的曲折(例如标签/性能/对抗性噪音),但越来越需要统一和可操作的理解。 我们的主要理论贡献是将正确性框架的概括化,其概念叫做扭曲-正确性,它描述了损失函数,能够“不动摇”扭曲的后端数据函数进入清洁的后端。 值得注意的是,我们表明,称为美元损失的非边际扩展功能(首先在信息理论中引入)是扭曲式的。 我们研究了在一种新型提振动算法(称为PILBoost)下的扭曲- ALpha- $ 损失, 并提供了正式和实验性结果,用于这一正切的折变方程。