In the weakly supervised learning paradigm, labeling functions automatically assign heuristic, often noisy, labels to data samples. In this work, we provide a method for learning from weak labels by separating two types of complementary information associated with the labeling functions: information related to the target label and information specific to one labeling function only. Both types of information are reflected to different degrees by all labeled instances. In contrast to previous works that aimed at correcting or removing wrongly labeled instances, we learn a branched deep model that uses all data as-is, but splits the labeling function information in the latent space. Specifically, we propose the end-to-end model SepLL which extends a transformer classifier by introducing a latent space for labeling function specific and task-specific information. The learning signal is only given by the labeling functions matches, no pre-processing or label model is required for our method. Notably, the task prediction is made from the latent layer without any direct task signal. Experiments on Wrench text classification tasks show that our model is competitive with the state-of-the-art, and yields a new best average performance.
翻译:在监管不力的学习范式中,标签功能会自动给数据样本自动分配疲劳、经常吵闹的标签。在这项工作中,我们通过分离与标签功能相关的两种补充信息,从薄弱标签中学习一种方法:与目标标签有关的信息和仅一个标签函数特有的信息。两种信息都在不同程度上被所有标签实例反映。与以往旨在纠正或消除错误标签实例的工作相比,我们学习了一个分支深层次模型,该模型将所有数据都原封不动地使用,但将潜在空间的标签功能信息分割开来。具体地说,我们提出了终端到终端模型SepL,该模型通过引入标签功能特定和特定任务信息的潜在空间来扩展变压器分类器。学习信号仅由标签功能匹配给出,不需要预处理或标签模型来显示我们的方法。值得注意的是,任务预测是在没有直接任务信号的情况下从潜在层中作出的。Wrench文本分类任务实验显示,我们的模型与最新工艺具有竞争力,并产生新的最佳平均性。