The training process of neural networks usually optimize weights and bias parameters of linear transformations, while nonlinear activation functions are pre-specified and fixed. This work develops a systematic approach to constructing matrix activation functions whose entries are generalized from ReLU. The activation is based on matrix-vector multiplications using only scalar multiplications and comparisons. The proposed activation functions depend on parameters that are trained along with the weights and bias vectors. Neural networks based on this approach are simple and efficient and are shown to be robust in numerical experiments.
翻译:神经网络的培训过程通常优化线性变换的重量和偏差参数,而非线性激活功能则是预先指定和固定的。这项工作为构建矩阵激活功能制定了系统的方法,这些功能的条目从ReLU得到普遍化。激活基于矩阵-矢量乘法,仅使用尺度乘法和比较。拟议的激活功能取决于经过培训的参数以及重量和偏向矢量。基于这一方法的神经网络简单而有效,在数字实验中显示是稳健的。