Despite achieving state-of-the-art results in nearly all Natural Language Processing applications, fine-tuning Transformer-based language models still requires a significant amount of labeled data to work. A well known technique to reduce the amount of human effort in acquiring a labeled dataset is \textit{Active Learning} (AL): an iterative process in which only the minimal amount of samples is labeled. AL strategies require access to a quantified confidence measure of the model predictions. A common choice is the softmax activation function for the final layer. As the softmax function provides misleading probabilities, this paper compares eight alternatives on seven datasets. Our almost paradoxical finding is that most of the methods are too good at identifying the true most uncertain samples (outliers), and that labeling therefore exclusively outliers results in worse performance. As a heuristic we propose to systematically ignore samples, which results in improvements of various methods compared to the softmax function.
翻译:尽管在几乎所有的自然语言处理应用中都取得了最新成果,但微调基于变异器的语言模型仍然需要大量标签数据才能发挥作用。 降低人类获取标签数据集工作量的众所周知的方法是\ textit{Avious Learning} (AL):一个仅贴上最低数量样本标签的迭接过程。 AL战略要求获得模型预测的量化信任度。 共同选择是最后一个层的软式最大激活功能。 由于软式马克思功能提供了误导性概率,本文比较了七个数据集的八个替代品。 我们几乎自相矛盾的发现是,大多数方法在识别真正最不确定的样本(外部)方面都太好,因此,标出外端的功能更差。 我们的偏重之处是,我们建议系统地忽略样本,从而导致与软式功能相比各种方法的改进。