We introduce a class of fully-connected neural networks whose activation functions, rather than being pointwise, rescale feature vectors by a function depending only on their norm. We call such networks radial neural networks, extending previous work on rotation equivariant networks that considers rescaling activations in less generality. We prove universal approximation theorems for radial neural networks, including in the more difficult cases of bounded widths and unbounded domains. Our proof techniques are novel, distinct from those in the pointwise case. Additionally, radial neural networks exhibit a rich group of orthogonal change-of-basis symmetries on the vector space of trainable parameters. Factoring out these symmetries leads to a practical lossless model compression algorithm. Optimization of the compressed model by gradient descent is equivalent to projected gradient descent for the full model.
翻译:我们引入了一组完全连通的神经网络,这些网络的激活功能,而不是尖锐的,通过仅取决于其规范的功能对特性矢量进行重新缩放。我们称这些网络为放射神经网络,扩大了先前关于旋转等同性网络的工作,这些网络考虑较不一般的再缩放激活。我们证明辐射神经网络普遍近似理论,包括在更困难的受约束宽度和无约束域的情况下。我们的验证技术是新颖的,不同于点的。此外,在可训练参数的矢量空间上,辐射神经网络显示出大量的正方形变化对称。计算这些对称导致一个实际的无损模型压缩算法。通过梯度下降优化压缩模型,相当于全模型的预测梯度下降。