State-of-the-art deep neural networks require large-scale labeled training data that is often expensive to obtain or not available for many tasks. Weak supervision in the form of domain-specific rules has been shown to be useful in such settings to automatically generate weakly labeled training data. However, learning with weak rules is challenging due to their inherent heuristic and noisy nature. An additional challenge is rule coverage and overlap, where prior work on weak supervision only considers instances that are covered by weak rules, thus leaving valuable unlabeled data behind. In this work, we develop a weak supervision framework (ASTRA) that leverages all the available data for a given task. To this end, we leverage task-specific unlabeled data through self-training with a model (student) that considers contextualized representations and predicts pseudo-labels for instances that may not be covered by weak rules. We further develop a rule attention network (teacher) that learns how to aggregate student pseudo-labels with weak rule labels, conditioned on their fidelity and the underlying context of an instance. Finally, we construct a semi-supervised learning objective for end-to-end training with unlabeled data, domain-specific rules, and a small amount of labeled data. Extensive experiments on six benchmark datasets for text classification demonstrate the effectiveness of our approach with significant improvements over state-of-the-art baselines.
翻译:最先进的深层神经网络需要大规模标签化的培训数据,而这些数据往往要花费昂贵才能获得或无法用于许多任务。在这种环境下,以特定领域规则形式进行的薄弱监督证明在这种环境下有用,自动生成标签薄弱的培训数据。然而,由于规则本身的杂乱和吵闹性质,学习规则薄弱是困难的。另一个挑战是规则覆盖面和重叠,以前关于监管薄弱的工作只考虑到规则薄弱的事例,从而留下宝贵的无标签数据。在这项工作中,我们开发了一个薄弱的监督框架(ASTRA),将所有可用数据用于某项任务。为此目的,我们利用一种模式(学生)来利用特定任务性非标签化的数据,通过这种模式(学生)来考虑背景化的表述,并预测规则薄弱规则可能无法覆盖的事例的假标签。我们进一步开发了一个规则关注网络(教师),以学习如何用薄弱规则标签标签标签标签的合成学生假标签,以其真实性和不可靠的背景为条件。最后,我们用一个半超强的域域域域标定的模型来利用我们数据库数据库中的重要数据库,展示我们数据库中的重要域域域标定数据的具体数据。