Leveraging unlabelled data through weak or distant supervision is a compelling approach to developing more effective text classification models. This paper proposes a simple but effective data augmentation method, which leverages the idea of pseudo-labelling to select samples from noisy distant supervision annotation datasets. The result shows that the proposed method improves the accuracy of biased news detection models.
翻译:通过薄弱或遥远的监督利用未贴标签的数据是发展更有效的文本分类模式的令人信服的方法。本文提出了一个简单而有效的数据增强方法,利用假标签的概念从吵闹的遥远的监督说明数据集中挑选样本。结果显示,拟议方法提高了偏向性新闻探测模型的准确性。