Maximizing the separation between classes constitutes a well-known inductive bias in machine learning and a pillar of many traditional algorithms. By default, deep networks are not equipped with this inductive bias and therefore many alternative solutions have been proposed through differential optimization. Current approaches tend to optimize classification and separation jointly: aligning inputs with class vectors and separating class vectors angularly. This paper proposes a simple alternative: encoding maximum separation as an inductive bias in the network by adding one fixed matrix multiplication before computing the softmax activations. The main observation behind our approach is that separation does not require optimization but can be solved in closed-form prior to training and plugged into a network. We outline a recursive approach to obtain the matrix consisting of maximally separable vectors for any number of classes, which can be added with negligible engineering effort and computational overhead. Despite its simple nature, this one matrix multiplication provides real impact. We show that our proposal directly boosts classification, long-tailed recognition, out-of-distribution detection, and open-set recognition, from CIFAR to ImageNet. We find empirically that maximum separation works best as a fixed bias; making the matrix learnable adds nothing to the performance. The closed-form implementation and code to reproduce the experiments are on github.
翻译:最大程度的分类是机器学习中众所周知的感性偏向,是许多传统算法的一个支柱。 默认情况下, 深层次网络没有配备这种感性偏向, 因此通过差别优化提出了许多替代解决方案。 目前的方法倾向于优化分类和分离: 将输入与类矢量相匹配, 将类矢量以角方式分离。 本文提出了一个简单的替代方案: 在计算软体激活之前添加一个固定的矩阵倍增, 将最大程度的分离编码为网络的感应偏向性偏向性。 我们的方法背后的主要观察是, 分离不需要优化, 但是可以在培训前以封闭形式解决, 并插入网络。 我们勾勒出一种循环方法, 以获得由任何种类的极易分解矢量组成的矩阵, 可以用微不足道的工程努力和计算间接费用加以补充。 尽管这一矩阵的倍增作用性质简单, 但它提供了真正的影响。 我们表明, 我们的提议直接促进了分类、 长期的识别、 分配外检测和公开识别, 从 CRIAR 到图像网络。 我们从实验中发现, 最大程度的分离实验是最佳程度的实验到固定的复制。