Many currently available universal approximation theorems affirm that deep feedforward networks defined using any suitable activation function can approximate any integrable function locally in $L^1$-norm. Though different approximation rates are available for deep neural networks defined using other classes of activation functions, there is little explanation for the empirically confirmed advantage that ReLU networks exhibit over their classical (e.g. sigmoidal) counterparts. Our main result demonstrates that deep networks with piecewise linear activation (e.g. ReLU or PReLU) are fundamentally more expressive than deep feedforward networks with analytic (e.g. sigmoid, Swish, GeLU, or Softplus). More specifically, we construct a strict refinement of the topology on the space $L^1_{\operatorname{loc}}(\mathbb{R}^d,\mathbb{R}^D)$ of locally Lebesgue-integrable functions, in which the set of deep ReLU networks with (bilinear) pooling $\operatorname{NN}^{\operatorname{ReLU} + \operatorname{Pool}}$ is dense (i.e. universal) but the set of deep feedforward networks defined using any combination of analytic activation functions with (or without) pooling layers $\operatorname{NN}^{\omega+\operatorname{Pool}}$ is not dense (i.e. not universal). Our main result is further explained by \textit{quantitatively} demonstrating that this "separation phenomenon" between the networks in $\operatorname{NN}^{\operatorname{ReLU}+\operatorname{Pool}}$ and those in $\operatorname{NN}^{\omega+\operatorname{Pool}}$ by showing that the networks in $\operatorname{NN}^{\operatorname{ReLU}}$ are capable of approximate any compactly supported Lipschitz function while \textit{simultaneously} approximating its essential support; whereas, the networks in $\operatorname{NN}^{\omega+\operatorname{pool}}$ cannot.
翻译:目前许多通用近似理论都确认,使用任何合适的激活功能来定义的深度种子转发网络 {UOOU或PReLU) 能够以$L1$-norm 来近似任何本地的不可调控功能。尽管使用其他类型的激活功能来定义的深神经网络有不同的近似率,但对于“RLU”网络在其传统(例如,Sigbb*d,\hathbb{R%D)对应方上展示的经经验证实的优势,我们的主要结果表明,带有平滑线性激活功能的深度网络(例如,RUOOU 或 PRELU) 基本上比用美元(g. signame, Swish, GELU, 或 Softplus。更具体地说,我们对空间 $L1+Otorname{(ator) 的顶端功能进行了严格的精细精细精细的精细的精细的精细精细精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的显示。