平滑活性功能或分析活性功能: 哪种生产产生更能表达的神经网络? (Piecewise-Linear Activations or Analytic Activation Functions: Which Produce More Expressive Neural Networks?)

Many currently available universal approximation theorems affirm that deep feedforward networks defined using any suitable activation function can approximate any integrable function locally in $L^1$-norm. Though different approximation rates are available for deep neural networks defined using other classes of activation functions, there is little explanation for the empirically confirmed advantage that ReLU networks exhibit over their classical (e.g. sigmoidal) counterparts. Our main result demonstrates that deep networks with piecewise linear activation (e.g. ReLU or PReLU) are fundamentally more expressive than deep feedforward networks with analytic (e.g. sigmoid, Swish, GeLU, or Softplus). More specifically, we construct a strict refinement of the topology on the space $L^1_{\operatorname{loc}}(\mathbb{R}^d,\mathbb{R}^D)$ of locally Lebesgue-integrable functions, in which the set of deep ReLU networks with (bilinear) pooling $\operatorname{NN}^{\operatorname{ReLU} + \operatorname{Pool}}$ is dense (i.e. universal) but the set of deep feedforward networks defined using any combination of analytic activation functions with (or without) pooling layers $\operatorname{NN}^{\omega+\operatorname{Pool}}$ is not dense (i.e. not universal). Our main result is further explained by \textit{quantitatively} demonstrating that this "separation phenomenon" between the networks in $\operatorname{NN}^{\operatorname{ReLU}+\operatorname{Pool}}$ and those in $\operatorname{NN}^{\omega+\operatorname{Pool}}$ by showing that the networks in $\operatorname{NN}^{\operatorname{ReLU}}$ are capable of approximate any compactly supported Lipschitz function while \textit{simultaneously} approximating its essential support; whereas, the networks in $\operatorname{NN}^{\omega+\operatorname{pool}}$ cannot.

翻译：目前许多通用近似理论都确认,使用任何合适的激活功能来定义的深度种子转发网络 {UOOU或PReLU) 能够以$L1$-norm 来近似任何本地的不可调控功能。尽管使用其他类型的激活功能来定义的深神经网络有不同的近似率,但对于“RLU”网络在其传统(例如,Sigbb*d,\hathbb{R%D)对应方上展示的经经验证实的优势,我们的主要结果表明,带有平滑线性激活功能的深度网络(例如,RUOOU 或 PRELU) 基本上比用美元(g. signame, Swish, GELU, 或 Softplus。更具体地说,我们对空间 $L1+Otorname{(ator) 的顶端功能进行了严格的精细精细精细的精细的精细的精细精细精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的显示。

相关内容

深度前馈网络

关注 6

深度前馈网络（deep feedforward network），也叫做前馈神经网络（feedforward neural network）或者多层感知机（multilayer perceptron, MLP）,是典型的深度学习模型。前馈网络的目标是近似某个函数 f^∗ 。例如，对于分类器，y = f^∗ (x)将输入x映射到一个类别y。前馈网络定义了一个映射y = f (x; θ)，并且学习参数θ的值使它能够得到最佳的函数近似。

机器学习损失函数概述，Loss Functions in Machine Learning

专知会员服务

84+阅读 · 2022年3月19日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日