Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to synthesize training labels efficiently. The core component of PWS is the label model, which infers true labels by aggregating the outputs of multiple noisy supervision sources abstracted as labeling functions (LFs). Existing statistical label models typically rely only on the outputs of LF, ignoring the instance features when modeling the underlying generative process. In this paper, we attempt to incorporate the instance features into a statistical label model via the proposed FABLE. In particular, it is built on a mixture of Bayesian label models, each corresponding to a global pattern of correlation, and the coefficients of the mixture components are predicted by a Gaussian Process classifier based on instance features. We adopt an auxiliary variable-based variational inference algorithm to tackle the non-conjugate issue between the Gaussian Process and Bayesian label models. Extensive empirical comparison on eleven benchmark datasets sees FABLE achieving the highest averaged performance across nine baselines.
翻译:方案薄弱监督(PWS)已成为高效合成培训标签的广泛范例。PWS的核心组成部分是标签模型,它通过汇总作为标签功能抽取的多个噪音监督源的产出,推断出真实标签。现有的统计标签模型通常只依赖LF的产出,在模拟基本基因化过程时忽略实例特征。在本文中,我们试图通过拟议的FEBL将实例特征纳入统计标签模型。特别是,它建在贝叶斯标签模型的混合体上,每个模型都与全球相关性模式相对应,混合物成分的系数由高斯进程分类员根据实例特征预测。我们采用了基于可变的辅助变法推算法,以解决高斯进程和巴伊斯标签模型之间的非趋同问题。对11个基准数据集进行了广泛的实验性比较,发现FEBSB达到了9个基线的最高平均性。