Threshold activation functions are highly preferable in neural networks due to their efficiency in hardware implementations. Moreover, their mode of operation is more interpretable and resembles that of biological neurons. However, traditional gradient based algorithms such as Gradient Descent cannot be used to train the parameters of neural networks with threshold activations since the activation function has zero gradient except at a single non-differentiable point. To this end, we study weight decay regularized training problems of deep neural networks with threshold activations. We first show that regularized deep threshold network training problems can be equivalently formulated as a standard convex optimization problem, which parallels the LASSO method, provided that the last hidden layer width exceeds a certain threshold. We also derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network. We corroborate our theoretical results with various numerical experiments.
翻译:在神经网络中,阈值激活功能由于硬件安装效率较高,因此在神经网络中非常可取。此外,它们的运行模式更便于解释,而且与生物神经元相似。然而,传统的梯度算法,如梯度源法,不能用来训练神经网络参数的临界点启动功能,因为激活功能只有零梯度,除非在一个非区别的点上。为此,我们研究使用临界点启动的深神经网络常规化训练问题。我们首先显示,正规化的深度网络训练问题可以等同于标准二次曲线优化问题,这与LASSO方法类似,条件是最后一层隐藏的层宽度超过某一阈值。我们还在数据元可以在网络某一层被粉碎时产生简化的二次曲线优化配方。我们用各种数字实验来证实我们的理论结果。</s>