In this paper, we first extend the result of FL93 and prove universal consistency for a classification rule based on wide and deep ReLU neural networks trained on the logistic loss. Unlike the approach in FL93 that decomposes the estimation and empirical error, we directly analyze the classification risk based on the observation that a realization of a neural network that is wide enough is capable of interpolating an arbitrary number of points. Secondly, we give sufficient conditions for a class of probability measures under which classifiers based on neural networks achieve minimax optimal rates of convergence. Our result is motivated from the practitioner's observation that neural networks are often trained to achieve 0 training error, which is the case for our proposed neural network classifiers. Our proofs hinge on recent developments in empirical risk minimization and on approximation rates of deep ReLU neural networks for various function classes of interest. Applications to classical function spaces of smoothness illustrate the usefulness of our result.
翻译:暂无翻译