Labeling training data has become one of the major roadblocks to using machine learning. Among various weak supervision paradigms, programmatic weak supervision (PWS) has achieved remarkable success in easing the manual labeling bottleneck by programmatically synthesizing training labels from multiple potentially noisy supervision sources. This paper presents a comprehensive survey of recent advances in PWS. In particular, we give a brief introduction of the PWS learning paradigm, and review representative approaches for each component within PWS's learning workflow. In addition, we discuss complementary learning paradigms for tackling limited labeled data scenarios and how these related approaches can be used in conjunction with PWS. Finally, we identify several critical challenges that remain under-explored in the area to hopefully inspire future research directions in the field.
翻译:标签培训数据已成为使用机器学习的主要障碍之一,在各种薄弱的监督模式中,方案薄弱的监督(PWS)在通过对来自多个可能十分吵闹的监督来源的培训标签进行方案综合整合,放宽人工标签标签瓶颈方面取得了显著成功。本文件全面调查了PWS的近期进展。特别是,我们简要介绍了PWS学习模式,并审查了PWS学习工作流程中每个组成部分的代表性方法。此外,我们讨论了处理有限标签数据情景的补充学习模式,以及如何与PWS一起使用这些相关方法。最后,我们确定了该地区一些尚未得到充分探讨的重大挑战,以激发未来实地研究方向。