While neural networks are used for classification tasks across domains, a long-standing open problem in machine learning is determining whether neural networks trained using standard procedures are optimal for classification, i.e., whether such models minimize the probability of misclassification for arbitrary data distributions. In this work, we identify and construct an explicit set of neural network classifiers that achieve optimality. Since effective neural networks in practice are typically both wide and deep, we analyze infinitely wide networks that are also infinitely deep. In particular, using the recent connection between infinitely wide neural networks and Neural Tangent Kernels, we provide explicit activation functions that can be used to construct networks that achieve optimality. Interestingly, these activation functions are simple and easy to implement, yet differ from commonly used activations such as ReLU or sigmoid. More generally, we create a taxonomy of infinitely wide and deep networks and show that these models implement one of three well-known classifiers depending on the activation function used: (1) 1-nearest neighbor (model predictions are given by the label of the nearest training example); (2) majority vote (model predictions are given by the label of the class with greatest representation in the training set); or (3) singular kernel classifiers (a set of classifiers containing those that achieve optimality). Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful.
翻译:虽然神经网络用于跨域的分类任务,但机器学习中长期存在的一个公开问题在于确定使用标准程序培训的神经网络是否最佳分类,即这些模型是否将任意数据分布的错误分类概率降至最低。在这项工作中,我们确定和建造一套明确的神经网络分类器,以实现最佳性。由于有效的神经网络在实践上通常既宽又深,我们分析无限宽的网络。特别是,利用无限宽的神经网络和神经凝固的内核之间最近的连接,我们提供了明确的启动功能,可用于建设实现最佳性的网络。有趣的是,这些激活功能简单易执行,但与通常使用的神经网络分类器不同,如ReLU或sigmormas。更一般地说,我们创建了无限宽和深度网络的分类,并显示这些模型根据所使用的激活功能,采用了三种广为人知的分类器:(1) 最接近的神经网络和神经相邻(由最接近的训练的标签提供模型预测);(2) 多数的启动功能是使用最深层次的分类,而最有害的分类则使用最深层次的分类的标签,从而得出了最有害的分类结果。