We introduce small-text, an easy-to-use active learning library, which offers pool-based active learning for single- and multi-label text classification in Python. It features numerous pre-implemented state-of-the-art query strategies, including some that leverage the GPU. Standardized interfaces allow the combination of a variety of classifiers, query strategies, and stopping criteria, facilitating a quick mix and match, and enabling a rapid and convenient development of both active learning experiments and applications. With the objective of making various classifiers and query strategies accessible for active learning, small-text integrates several well-known machine learning libraries, namely scikit-learn, PyTorch, and Hugging Face transformers. The latter integrations are optionally installable extensions, so GPUs can be used but are not required. Using this new library, we investigate the performance of the recently published SetFit training paradigm, which we compare to vanilla transformer fine-tuning, finding that it matches the latter in classification accuracy while outperforming it in area under the curve. The library is available under the MIT License at https://github.com/webis-de/small-text, in version 1.3.0 at the time of writing.
翻译:我们引入了小型文本,这是一个易于使用的活跃学习图书馆,它为Python的单一和多标签文本分类提供了基于库库的积极学习,它为Python的单一和多标签文本分类提供了基础。它包含许多预先实施的最新查询战略,包括一些利用 GPU的功能。标准化界面可以将各种分类者、查询战略和停止标准结合起来,便于快速混合和匹配,并有利于快速和方便地发展积极的学习实验和应用。为了让各种分类和查询战略便于积极学习,小文本将若干知名的机器学习图书馆,即Scikit-learn、PyTorrch和Hugging Face变异器融合在一起。后一种整合是可随意安装的扩展,因此GPUs可以使用,但并不需要。我们利用这个新的图书馆,调查最近出版的SetFit培训模式的绩效,我们把它与Vanilla变异器的微调进行比较,发现它与后者的分类准确性相匹配,同时在曲线下表现它。图书馆在https://girmwebbbis版本的Mlishalplishal.</s>