We investigate the sample complexity of bounded two-layer neural networks using different activation functions. In particular, we consider the class $$ \mathcal{H} = \left\{\textbf{x}\mapsto \langle \textbf{v}, \sigma \circ W\textbf{b} + \textbf{b} \rangle : \textbf{b}\in\mathbb{R}^d, W \in \mathbb{R}^{\mathcal{T}\times d}, \textbf{v} \in \mathbb{R}^{\mathcal{T}}\right\} $$ where the spectral norm of $W$ and $\textbf{v}$ is bounded by $O(1)$, the Frobenius norm of $W$ is bounded from its initialization by $R > 0$, and $\sigma$ is a Lipschitz activation function. We prove that if $\sigma$ is element-wise, then the sample complexity of $\mathcal{H}$ has only logarithmic dependency in width and that this complexity is tight, up to logarithmic factors. We further show that the element-wise property of $\sigma$ is essential for a logarithmic dependency bound in width, in the sense that there exist non-element-wise activation functions whose sample complexity is linear in width, for widths that can be up to exponential in the input dimension. For the upper bound, we use the recent approach for norm-based bounds named Approximate Description Length (ADL) by arXiv:1910.05697. We further develop new techniques and tools for this approach that will hopefully inspire future works.
翻译:我们使用不同的激活功能调查两层神经网络的绑定复杂性。 特别是, 我们考虑类 $\ mathcal{H} =\ left{ textbf{ x\\\ mappsto\ langle\ textbf{v},\ sgma\ crc W\ textbf{b} +\ textbf{b}\ rangle:\ textb{ b\ in\ mathb{R} d, W\in\ mathbb{ rlight_ rmalthb{Rämathcal{T_ H} { textfrlight} =left\ mortbb{x{x} {xleft}, $W和 ltblentbff{v} 的光谱标准是 $(1)美元, $Frobenius 的规范从初始化到 $R >, Wirealblybly bld_ld_lock_licks yal_bs comma_brick_brough 。 我们证明, lixxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx的精基的精基的精基的精基的精度, 缩缩缩缩缩算法, 缩算法是最近的缩缩缩算法。</s>