We propose a kernelized classification layer for deep networks. Although conventional deep networks introduce an abundance of nonlinearity for representation (feature) learning, they almost universally use a linear classifier on the learned feature vectors. We advocate a nonlinear classification layer by using the kernel trick on the softmax cross-entropy loss function during training and the scorer function during testing. However, the choice of the kernel remains a challenge. To tackle this, we theoretically show the possibility of optimizing over all possible positive definite kernels applicable to our problem setting. This theory is then used to device a new kernelized classification layer that learns the optimal kernel function for a given problem automatically within the deep network itself. We show the usefulness of the proposed nonlinear classification layer on several datasets and tasks.
翻译:我们为深层网络建议一个内分层分类层。 虽然传统的深层网络为演示(地物)学习引入大量非线性(非线性),但它们几乎普遍使用对所学特性矢量的线性分类器。 我们在培训过程中和测试中计分器功能中,使用软式麦克斯交叉血性丧失功能的内核把戏,倡导非线性分类层。然而,选择内核仍是一个挑战。要解决这个问题,我们理论上表明有可能优化适用于我们问题设置的所有可能的正式明确内核。 然后,这一理论被用于安装一个新的内核分类层,在深层网络内自动学习特定问题的最佳内核功能。 我们在若干数据集和任务上展示了拟议的非线性分类层的有用性。