We present small-text, a simple and modular active learning library, which offers pool-based active learning for single- and multi-label text classification in Python. It comes with various pre-implemented state-of-the-art query strategies, including some that can leverage the GPU. Clearly defined interfaces allow the combination of a multitude of classifiers, query strategies, and stopping criteria, thereby facilitating a quick mix and match, and enabling a rapid development of both active learning experiments and applications. To make various classifiers accessible in a consistent way, it integrates several well-known existing machine learning libraries, namely, scikit-learn, PyTorch, and huggingface transformers, where the latter integrations are available as optionally installable extensions, making the availability of a GPU competely optional. The library is available under the MIT License at https://github.com/webis-de/small-text.
翻译:我们展示了小型文本,这是一个简单和模块化的主动学习图书馆,它为Python的单一和多标签文本分类提供了基于集合的积极学习,它包含各种预先实施的最新查询策略,包括一些能够利用GPU的策略。明确界定的界面可以将多种分类者、查询策略和停止标准结合起来,从而便利快速混合和匹配,并使得积极学习实验和应用程序都能迅速发展。为使各种分类者能够以一致的方式进入,它整合了几个著名的现有机器学习图书馆,即:Scikit-learn、PyTorch和拥抱式变异器,后者的整合作为可选择的可安装扩展,使GPU的可用性具有竞争性。图书馆在https://github.com/webis-de/ small-text的麻省麻省麻省理工学院许可证下提供。