Weak Supervision (WS) techniques allow users to efficiently create large training datasets by programmatically labeling data with heuristic sources of supervision. While the success of WS relies heavily on the provided labeling heuristics, the process of how these heuristics are created in practice has remained under-explored. In this work, we formalize the development process of labeling heuristics as an interactive procedure, built around the existing workflow where users draw ideas from a selected set of development data for designing the heuristic sources. With the formalism, we study two core problems of how to strategically select the development data to guide users in efficiently creating informative heuristics, and how to exploit the information within the development process to contextualize and better learn from the resultant heuristics. Building upon two novel methodologies that effectively tackle the respective problems considered, we present Nemo, an end-to-end interactive system that improves the overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS approach.
翻译:微弱的监督(WS)技术使用户能够有效地通过方案标签数据创建大型培训数据集,其方式是用黑奴主义的监督来源对数据进行编程标签。虽然WS的成功在很大程度上依赖于所提供的标有黑奴主义的标签,但实际中如何创造这些黑奴主义的过程仍然未得到充分探讨。在这项工作中,我们正式确定将黄奴主义标签作为一种互动程序的开发过程,建立在现有工作流程的周围,用户从选定的一组发展数据中提取想法,用于设计黄奴主义的来源。在形式主义方面,我们研究了两个核心问题:如何从战略上选择发展数据,以指导用户高效率地创建信息丰富的黑奴主义,以及如何利用发展进程中的信息来背景化并更好地从结果的黑奴主义中学习。基于有效解决所考虑的各自问题的两种新方法,我们介绍了尼莫尔,一个端对端互动系统,它提高了WS学习管道的总体生产率,比流行的WS方法平均提高20%(在一项任务中达到47% ) 。