众包学习作为领域适应:关于名称实体识别的案例研究 (Crowdsourcing Learning as Domain Adaptation: A Case Study on Named Entity Recognition)

Crowdsourcing is regarded as one prospective solution for effective supervised learning, aiming to build large-scale annotated training data by crowd workers. Previous studies focus on reducing the influences from the noises of the crowdsourced annotations for supervised models. We take a different point in this work, regarding all crowdsourced annotations as gold-standard with respect to the individual annotators. In this way, we find that crowdsourcing could be highly similar to domain adaptation, and then the recent advances of cross-domain methods can be almost directly applied to crowdsourcing. Here we take named entity recognition (NER) as a study case, suggesting an annotator-aware representation learning model that inspired by the domain adaptation methods which attempt to capture effective domain-aware features. We investigate both unsupervised and supervised crowdsourcing learning, assuming that no or only small-scale expert annotations are available. Experimental results on a benchmark crowdsourced NER dataset show that our method is highly effective, leading to a new state-of-the-art performance. In addition, under the supervised setting, we can achieve impressive performance gains with only a very small scale of expert annotations.

翻译：众包被视为有效监督学习的一个潜在解决方案,目的是建立大型的有附加说明的培训数据。先前的研究侧重于减少来自众包说明对受监督模型的噪音的影响。我们在此工作中采取不同的观点,将所有众包说明视为个人批注的黄金标准。通过这种方式,我们发现众包可以与领域适应高度相似,然后交叉域方法的最新进展可以几乎直接适用于众包。我们在这里将名称为实体识别(NER)作为一个研究案例,建议一种由试图捕捉有效域目识别特征的域适应方法所启发的识别代表学习模式。我们调查未经监督和监督的众包学习,假设没有或只有小规模专家说明。基于基准的众包NER数据集的实验结果显示,我们的方法非常有效,导致新的状态性业绩。此外,在受监督的环境下,我们只能以非常小的规模的专家说明来取得令人印象深刻的业绩成果。