The top-k classification accuracy is one of the core metrics in machine learning. Here, k is conventionally a positive integer, such as 1 or 5, leading to top-1 or top-5 training objectives. In this work, we relax this assumption and optimize the model for multiple k simultaneously instead of using a single k. Leveraging recent advances in differentiable sorting and ranking, we propose a differentiable top-k cross-entropy classification loss. This allows training the network while not only considering the top-1 prediction, but also, e.g., the top-2 and top-5 predictions. We evaluate the proposed loss function for fine-tuning on state-of-the-art architectures, as well as for training from scratch. We find that relaxing k does not only produce better top-5 accuracies, but also leads to top-1 accuracy improvements. When fine-tuning publicly available ImageNet models, we achieve a new state-of-the-art for these models.
翻译:最高k分类精确度是机器学习的核心指标之一。 这里, k 通常是一个正整数, 如 1 或 5, 导致最高1 或最高5 的训练目标。 在这项工作中, 我们放松这一假设, 并同时优化多千位模型, 而不是使用单一k。 我们利用最近不同分类和排序的进展, 提出一个可区别的上克交叉热带分类损失。 这样可以培训网络, 同时不仅考虑最高1 的预测, 而且还考虑最高2 和最高5 的预测。 我们评估了拟议的损失函数, 以微调最先进的建筑以及从头到尾的培训。 我们发现, 放松 k 不仅产生更好的最高5 级的加速度, 而且还导致最高1 的精确度改进。 当微调公众可用的图像网络模型时, 我们为这些模型实现新的状态 。