In weakly-supervised text classification, only label names act as sources of supervision. Predominant approaches to weakly-supervised text classification utilize a two-phase framework, where test samples are first assigned pseudo-labels and are then used to train a neural text classifier. In most previous work, the pseudo-labeling step is dependent on obtaining seed words that best capture the relevance of each class label. We present LIME, a framework for weakly-supervised text classification that entirely replaces the brittle seed-word generation process with entailment-based pseudo-classification. We find that combining weakly-supervised classification and textual entailment mitigates shortcomings of both, resulting in a more streamlined and effective classification pipeline. With just an off-the-shelf textual entailment model, LIME outperforms recent baselines in weakly-supervised text classification and achieves state-of-the-art in 4 benchmarks. We open source our code at https://github.com/seongminp/LIME.
翻译:在监管不力的文本分类中,只有标签名称作为监管来源。在监管不力的文本分类中,主要采用监管不力的文本分类方法,采用两阶段框架,先将测试样品指定为假标签,然后用于培训神经文本分类员。在大多数先前的工作中,伪标签步骤取决于获得种子词,而种子词最能捕捉到每个类标签的关联性。我们介绍了一个监管不力的文本分类框架LIME,这是一个监管不力的文本分类框架,它完全取代了基于隐含的伪分类的易碎种子文字生成过程。我们发现,将薄弱监管分类和文本要求合并可以减轻两者的缺陷,从而导致更简化和更有效的分类管道。只要采用现成的文本要求模型,LIME就超越了薄弱监管文本分类中最近的基线,并在4个基准中实现“状态”。我们在https://github.com/seongminp/LIME中打开了我们的代码源。