While deep learning is a powerful tool for natural language processing (NLP) problems, successful solutions to these problems rely heavily on large amounts of annotated samples. However, manually annotating data is expensive and time-consuming. Active Learning (AL) strategies reduce the need for huge volumes of labeled data by iteratively selecting a small number of examples for manual annotation based on their estimated utility in training the given model. In this paper, we argue that since AL strategies choose examples independently, they may potentially select similar examples, all of which may not contribute significantly to the learning process. Our proposed approach, Active$\mathbf{^2}$ Learning (A$\mathbf{^2}$L), actively adapts to the deep learning model being trained to eliminate further such redundant examples chosen by an AL strategy. We show that A$\mathbf{^2}$L is widely applicable by using it in conjunction with several different AL strategies and NLP tasks. We empirically demonstrate that the proposed approach is further able to reduce the data requirements of state-of-the-art AL strategies by an absolute percentage reduction of $\approx\mathbf{3-25\%}$ on multiple NLP tasks while achieving the same performance with no additional computation overhead.
翻译:虽然深层次学习是自然语言处理(NLP)问题的一个有力工具,但这些问题的成功解决办法在很大程度上依赖大量附加说明的样本。然而,人工注解数据既昂贵又费时。积极学习(AL)战略通过迭接地选择少量实例进行人工注解,根据对特定模式培训的估计效用,减少了大量贴标签数据的需求。在本文中,我们争辩说,由于AL战略独立选择实例,它们可能选择类似的例子,所有这些例子都可能对学习过程没有重大贡献。我们提议的方法,即Pentive$\mathb ⁇ 2}$学习(A$\mathb ⁇ 2}L),积极适应正在培训的深层次学习模式,以进一步消除AL战略所选择的多余例子。我们表明,A$\mathb}2}L通过使用它与不同的AL战略和NLP任务相结合,可以广泛应用。我们从经验上证明,拟议的方法能够进一步减少AL战略的数据要求。我们提议的方法是,即P-native$(Approxx_25maxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx