We study the influence of different activation functions in the output layer of deep neural network models for soft and hard label prediction in the learning with disagreement task. In this task, the goal is to quantify the amount of disagreement via predicting soft labels. To predict the soft labels, we use BERT-based preprocessors and encoders and vary the activation function used in the output layer, while keeping other parameters constant. The soft labels are then used for the hard label prediction. The activation functions considered are sigmoid as well as a step-function that is added to the model post-training and a sinusoidal activation function, which is introduced for the first time in this paper.
翻译:我们研究了在学习不同见解任务中,用于深度神经网络模型输出层的不同激活函数对软标签和硬标签预测的影响。在该任务中,目的是通过预测软标签来量化不同见解的数量。我们使用基于 BERT 的预处理器和编码器来预测软标签,在保持其他参数不变的情况下,变化用于输出层的激活函数。随后使用软标签来进行硬标签预测。所考虑的激活函数包括sigmoid函数,以及在训练后添加到模型中的阶跃函数和本文首次引入的正弦形激活函数。