Multi-class classification problems often have many semantically similar classes. For example, 90 of ImageNet's 1000 classes are for different breeds of dog. We should expect that these semantically similar classes will have similar parameter vectors, but the standard cross entropy loss does not enforce this constraint. We introduce the tree loss as a drop-in replacement for the cross entropy loss. The tree loss re-parameterizes the parameter matrix in order to guarantee that semantically similar classes will have similar parameter vectors. Using simple properties of stochastic gradient descent, we show that the tree loss's generalization error is asymptotically better than the cross entropy loss's. We then validate these theoretical results on synthetic data, image data (CIFAR100, ImageNet), and text data (Twitter).
翻译:多级分类问题往往有许多相似的等级。 例如, 图像网络 1000 类中的 90 类是针对不同品种狗的。 我们应期望这些相似的类会有相似的参数矢量, 但标准的跨倍数损失没有强制实施这一限制。 我们引入树损失作为跨倍数损失的投进替代。 树损失重新校正参数矩阵, 以保证在语系上相似的类将具有相似的参数矢量。 我们使用随机梯度下行的简单特性, 显示树损失的概括错误比跨倍数损失的误差要简单好。 我们随后验证了合成数据、 图像数据( CIFAR100, 图像网) 和文本数据的理论结果( Twitter ) 。