Loss functions serve as the foundation of supervised learning and are often chosen prior to model development. To avoid potentially ad hoc choices of losses, statistical decision theory describes a desirable property for losses known as \emph{properness}, which asserts that Bayes' rule is optimal. Recent works have sought to \emph{learn losses} and models jointly. Existing methods do this by fitting an inverse canonical link function which monotonically maps $\mathbb{R}$ to $[0,1]$ to estimate probabilities for binary problems. In this paper, we extend monotonicity to maps between $\mathbb{R}^{C-1}$ and the projected probability simplex $\tilde{\Delta}^{C-1}$ by using monotonicity of gradients of convex functions. We present {\sc LegendreTron} as a novel and practical method that jointly learns \emph{proper canonical losses} and probabilities for multiclass problems. Tested on a benchmark of domains with up to 1,000 classes, our experimental results show that our method consistently outperforms the natural multiclass baseline under a $t$-test at 99% significance on all datasets with greater than 10 classes.
翻译:损失函数是受监督学习的基础, 并且往往在模型开发之前选择。 为了避免潜在的临时选择损失, 统计决定理论描述了一种理想的损失属性, 称为 emph{ properness}, 它声称拜斯的规则是最佳的。 最近的工作试图将 emph{ learn sales} 和模型联合起来。 现有的方法是将单调的单调卡通链接函数( 单调地图 $\ mathbb{ R} 到 $$$ $ $ 10, 1美元) 和预测的概率简单x $\ tilde\ Delta{ C-1} 之间。 我们提出这种方法, 将单调的卡通链接函数安装为反向的卡通链接函数, 单调地图将 $\ mathb{ proprecialalal dissions} 和多级问题的概率匹配。 本文中, 我们将单调单调的单调的单调度扩展到 $ 100 。 我们的实验结果显示我们在10 级下的所有方法在10 基底的自然比 基值高 。