We formalize and study the natural approach of designing convex surrogate loss functions via embeddings, for problems such as classification, ranking, or structured prediction. In this approach, one embeds each of the finitely many predictions (e.g. rankings) as a point in $R^d$, assigns the original loss values to these points, and "convexifies" the loss in some way to obtain a surrogate. We establish a strong connection between this approach and polyhedral (piecewise-linear convex) surrogate losses: every discrete loss is embedded by some polyhedral loss, and every polyhedral loss embeds some discrete loss. Moreover, an embedding gives rise to a consistent link function as well as linear surrogate regret bounds. Our results are constructive, as we illustrate with several examples. In particular, our framework gives succinct proofs of consistency or inconsistency for various polyhedral surrogates in the literature, and for inconsistent surrogates, it further reveals the discrete losses for which these surrogates are consistent. We go on to show additional structure of embeddings, such as the equivalence of embedding and matching Bayes risks, and the equivalence of various notions of non-redudancy. Using these results, we establish that indirect elicitation, a necessary condition for consistency, is also sufficient when working with polyhedral surrogates.
翻译:我们正式确定并研究通过嵌入来设计 convex 代谢损失功能的自然方法, 包括分类、 排名或结构化预测等问题。 在这种方法中, 将有限的许多预测( 如排名)中的每一项( 如排名) 都嵌入为美元元值的点, 将原始损失值指定为美元元元值, 并将损失以某种方式“ 解密” 来获取代孕。 我们建立了这一方法与多元( 单线性) 代谢损失之间的紧密联系: 每一种离散损失都由某种多重损失所嵌入, 每一种多元性损失都嵌入某种离散损失。 此外, 嵌入可以产生一种一致的联系功能, 以及线性代谢的遗憾界限。 我们用几个例子来说明我们的结果是建设性的。 特别是, 我们的框架可以简明地证明各种多元性代谢在文献中的一致性或不一致性代孕, 它进一步揭示了这些代孕国所必须坚持的离散损失。 我们还在使用多种代谢式的代谢性结构时, 将各种代谢的代谢和代谢性结果进行充分的相互对比。