We present small-text, an easy-to-use active learning library written in Python, which offers pool-based active learning for single- and multi-label text classification in Python. It features many pre-implemented state-of-the-art query strategies, including some that leverage the GPU. Standardized interfaces allow the combination of a variety of classifiers, query strategies, and stopping criteria, facilitating a quick mix and match, and enabling a rapid development of both active learning experiments and applications. In order to make various classifiers and query strategies accessible for active learning, small-text integrates several well-known machine learning libraries, namely scikit-learn, PyTorch, and Hugging Face transformers. The latter integrations are optionally installable extensions, so GPUs can be used but are not required. The library is publicly available under the MIT License at https://github.com/webis-de/small-text, in version 1.1.1 at the time of writing.
翻译:我们用Python 书写了小型文本,这是一个易于使用的活跃学习图书馆,它为Python 的单一和多标签文本分类提供了基于库库的积极学习,它具有许多预先实施的最新查询战略,包括一些利用GPU的功能。标准化的界面可以将各种分类、查询战略和停止标准结合起来,便于快速混合和匹配,并能够迅速发展积极的学习实验和应用。为了使各种分类和查询战略便于积极学习,小文本将若干著名的机器学习图书馆(即Scikit-learn、PyTorch和Hugging Face变异器)融合在一起,后者是可选择的安装扩展,因此可以使用GPUP,但并不需要。根据麻省理学学会的许可证,该图书馆在撰写时可公开查阅第1.1.1版的https://github.com/webis-de/ small-t。