One of the reasons why many neural networks are capable of replicating complicated tasks or functions is their universal property. Though the past few decades have seen tremendous advances in theories of neural networks, a single constructive framework for neural network universality remains unavailable. This paper is the first effort to provide a unified and constructive framework for the universality of a large class of activation functions including most of existing ones. At the heart of the framework is the concept of neural network approximate identity (nAI). The main result is: {\em any nAI activation function is universal}. It turns out that most of existing activation functions are nAI, and thus universal in the space of continuous functions on compacta. The framework induces {\bf several advantages} over the contemporary counterparts. First, it is constructive with elementary means from functional analysis, probability theory, and numerical analysis. Second, it is the first unified attempt that is valid for most of existing activation functions. Third, as a by product, the framework provides the first universality proof for some of the existing activation functions including Mish, SiLU, ELU, GELU, and etc. Fourth, it provides new proofs for most activation functions. Fifth, it discovers new activation functions with guaranteed universality property. Sixth, for a given activation and error tolerance, the framework provides precisely the architecture of the corresponding one-hidden neural network with predetermined number of neurons, and the values of weights/biases. Seventh, the framework allows us to abstractly present the first universal approximation with favorable non-asymptotic rate.
翻译:许多神经网络能够复制复杂任务或功能的原因之一是其普遍性特性。虽然过去几十年在神经网络理论方面取得了巨大的进步,但目前仍然缺乏一个单一的神经网络普遍性的建设性框架。本文件是首次努力为包括大多数现有功能在内的大规模启动功能的普遍性提供一个统一和建设性的框架。框架的核心是神经网络近似特性的概念。其主要结果是:任何 nAI 启动功能都是普遍性的。事实证明,大多数现有的启动功能都是nAI,因此在不断的神经网络功能空间中是普遍性的。这个框架为当代对应方带来了若干优势。首先,它具有从功能分析、概率理论和数字分析中获得的基本手段的建设性框架。第二,这是对大多数现有启动功能具有有效性的第一次统一尝试。第三,作为一个产品,这个框架为包括Mish、SilU、ELU、GELU等一些现有的启动功能提供了第一个普遍性证明。第四,它为最稳定的启动功能提供了新的可持续性和相应的结构提供了新的证据。第五,它为最稳定的启动功能提供了新的可持续性和稳定的框架。