While the predictive performance of modern statistical dependency parsers relies heavily on the availability of expensive expert-annotated treebank data, not all annotations contribute equally to the training of the parsers. In this paper, we attempt to reduce the number of labeled examples needed to train a strong dependency parser using batch active learning (AL). In particular, we investigate whether enforcing diversity in the sampled batches, using determinantal point processes (DPPs), can improve over their diversity-agnostic counterparts. Simulation experiments on an English newswire corpus show that selecting diverse batches with DPPs is superior to strong selection strategies that do not enforce batch diversity, especially during the initial stages of the learning process. Additionally, our diversityaware strategy is robust under a corpus duplication setting, where diversity-agnostic sampling strategies exhibit significant degradation.
翻译:虽然现代统计依赖分析员的预测性表现在很大程度上依赖于能否获得昂贵的专家附加说明的树库数据,但并非所有说明都同样有助于对采集员的培训。在本文件中,我们试图通过批量积极学习(AL)来减少培训强有力的依赖分析员所需的贴标签实例数量。我们特别调查利用确定点进程(DPPs)在抽样的批次中实施多样性是否比其多样性-不可知性对口(DPPs)有所改进。英国新闻网络资料库的模拟实验显示,选择不同批次的DPP优于不强制实施批次多样性的强有力的选择战略,特别是在学习过程的初始阶段。此外,我们的多样性意识战略在机构重复环境下是强有力的,在多样性-不可知性抽样战略出现严重退化的地方。