Multiclass multilabel classification is the task of attributing multiple labels to examples via predictions. Current models formulate a reduction of the multilabel setting into either multiple binary classifications or multiclass classification, allowing for the use of existing loss functions (sigmoid, cross-entropy, logistic, etc.). Multilabel classification reductions do not accommodate for the prediction of varying numbers of labels per example and the underlying losses are distant estimates of the performance metrics. We propose a loss function, sigmoidF1, which is an approximation of the F1 score that (1) is smooth and tractable for stochastic gradient descent, (2) naturally approximates a multilabel metric, and (3) estimates label propensities and label counts. We show that any confusion matrix metric can be formulated with a smooth surrogate. We evaluate the proposed loss function on text and image datasets, and with a variety of metrics, to account for the complexity of multilabel classification evaluation. sigmoidF1 outperforms other loss functions on one text and two image datasets and several metrics. These results show the effectiveness of using inference-time metrics as loss functions for non-trivial classification problems like multilabel classification.
翻译:多级多标签分类是将多种标签归入预测实例的任务。当前模型将多标签设置缩减为多二分分类或多级分类,允许使用现有的损失功能(如类、跨类、跨类、后勤等)。多标签分类的减少并不适合预测每个示例不同数量的标签,而潜在损失是性能指标的远测值。我们提议了一种损失函数,即SigmoidF1, 即F1分的近似值,即(1) 平滑和可移植到随机梯度下行的F1分,(2) 自然近似多标签度和(3) 估计标签的常识性和标签值。我们表明,任何混乱矩阵指标都可以用光滑的代号来制定。我们评估了文本和图像数据集以及各种参数的拟议损失功能,以考虑多标签分类评估的复杂性。SigmoidF1 将其他损失函数比对一个文本和两个图像数据集及若干度度值。这些结果显示,使用类比时间指标分类作为非分类的损失函数。