Real-world applications often require improved models by leveraging a range of cheap incidental supervision signals. These could include partial labels, noisy labels, knowledge-based constraints, and cross-domain or cross-task annotations -- all having statistical associations with gold annotations but not exactly the same. However, we currently lack a principled way to measure the benefits of these signals to a given target task, and the common practice of evaluating these benefits is through exhaustive experiments with various models and hyperparameters. This paper studies whether we can, in a single framework, quantify the benefits of various types of incidental signals for a given target task without going through combinatorial experiments. We propose a unified PAC-Bayesian motivated informativeness measure, PABI, that characterizes the uncertainty reduction provided by incidental supervision signals. We demonstrate PABI's effectiveness by quantifying the value added by various types of incidental signals to sequence tagging tasks. Experiments on named entity recognition (NER) and question answering (QA) show that PABI's predictions correlate well with learning performance, providing a promising way to determine, ahead of learning, which supervision signals would be beneficial.
翻译:现实世界应用往往需要通过利用一系列廉价的附带监督信号来改进模型,其中可包括部分标签、噪音标签、基于知识的限制以及跨领域或跨任务说明 -- -- 所有这些都具有黄金说明的统计联系,但并非完全相同;然而,我们目前缺乏原则性的方法来衡量这些信号对特定目标任务的益处,评价这些效益的通常做法是通过对各种模型和超光谱的详尽实验来评估这些效益。本文研究的是,我们是否可以在一个单一框架内量化某项目标任务的各种附带信号的效益,而不必经过组合实验。我们提出了一个统一的PAC-BAYESian积极的信息性措施,即PABI,这是附带监督信号减少不确定性的特点。我们通过量化各种附带信号为标记任务排序增加的价值,证明了PABI的有效性。关于指定实体的识别和问题回答的实验表明,PABI的预测与学习业绩相关,提供了在学习之前确定哪些监督信号将是有益的有希望的方法。