Developed as a solution to a practical need, active learning (AL) methods aim to reduce label complexity and the annotations costs in supervised learning. While recent work has demonstrated the benefit of using AL in combination with large pre-trained language models (PLMs), it has often overlooked the practical challenges that hinder the feasibility of AL in realistic settings. We address these challenges by leveraging representation smoothness analysis to improve the effectiveness of AL. We develop an early stopping technique that does not require a validation set -- often unavailable in realistic AL settings -- and observe significant improvements across multiple datasets and AL methods. Additionally, we find that task adaptation improves AL, whereas standard short fine-tuning in AL does not provide improvements over random sampling. Our work establishes the usefulness of representation smoothness analysis in AL and presents an AL stopping criterion that reduces label complexity.
翻译:积极学习(AL)方法是作为实际需要的一种解决办法而开发的,旨在减少标签的复杂性和受监督学习的附加说明费用。虽然最近的工作表明使用AL与大型预先培训的语言模式相结合的好处,但往往忽视了妨碍AL在现实环境中的可行性的实际挑战。我们通过利用代表性平稳分析来应对这些挑战,以提高AL的效力。我们开发了一种不需要验证的早期停止技术 -- -- 在现实的AL环境中常常无法找到 -- -- 并观察到多个数据集和AL方法之间的重大改进。此外,我们发现任务调整改善了AL,而AL的标准短微调整并没有提供相对于随机抽样的改进。我们的工作确立了AL的平稳代表性分析的效用,并提出了降低标签复杂性的AL停止标准。