Supervised machine learning has become the cornerstone of today's data-driven society, increasing the need for labeled data. However, the process of acquiring labels is often expensive and tedious. One possible remedy is to use active learning (AL) -- a special family of machine learning algorithms designed to reduce labeling costs. Although AL has been successful in practice, a number of practical challenges hinder its effectiveness and are often overlooked in existing AL annotation tools. To address these challenges, we developed ALANNO, an open-source annotation system for NLP tasks equipped with features to make AL effective in real-world annotation projects. ALANNO facilitates annotation management in a multi-annotator setup and supports a variety of AL methods and underlying models, which are easily configurable and extensible.
翻译:受监督的机器学习已成为当今数据驱动社会的基石,增加了对标签数据的需求。然而,获取标签的过程往往昂贵而乏味,一种可能的补救办法是使用积极学习(AL) -- -- 一种旨在降低标签成本的特殊的机器学习算法体系。虽然AL在实践上是成功的,但一些实际挑战妨碍了它的有效性,并且常常在现有的AL说明工具中被忽视。为了应对这些挑战,我们开发了ALANNO,这是一个为NLP任务提供开放源注解的系统,配备了使AL在现实世界的注解项目中发挥作用的特征。ALANNO促进多标识装置中的批注管理,并支持各种易配置和可扩展的AL方法和基本模型。