We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is $x\Phi(x)$, where $\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs ($x\mathbf{1}_{x>0}$). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.
翻译:我们建议高斯错误线性单元(GELU),这是一个高性能神经网络激活功能。 GELU的激活功能是$x\Phi(x)$, 标准高斯累积分布函数是$\Phi(x)$。 GELU的非线性权重按其价值计算输入量,而不是按其符号(如 ReLUs (x\mathbf{1 ⁇ x>0美元)))对门输入量进行输入。我们对GELU的对RELU和ELU的激活的非线性性进行了经验评估,并发现所有考虑的计算机视觉、自然语言处理和语音任务的性能改进。