Active learning is able to significantly reduce the annotation cost for data-driven techniques. However, previous active learning approaches for natural language processing mainly depend on the entropy-based uncertainty criterion, and ignore the characteristics of natural language. In this paper, we propose a pre-trained language model based active learning approach for sentence matching. Differing from previous active learning, it can provide linguistic criteria to measure instances and help select more efficient instances for annotation. Experiments demonstrate our approach can achieve greater accuracy with fewer labeled training instances.
翻译:积极学习可以大幅降低数据驱动技术的注释成本,然而,以往对自然语言处理的积极学习方法主要取决于基于酶基的不确定性标准,而忽略了自然语言的特性。在本文件中,我们提出了一种以培训前语言模式为基础的积极学习方法,用于匹配句子。与以往积极学习不同,它可以提供语言标准,以衡量实例,帮助选择更有效的注释实例。实验表明,我们的方法可以以较少的标签培训实例实现更高的准确性。