We explore the task of predicting the leading political ideology or bias of news articles. First, we collect and release a large dataset of 34,737 articles that were manually annotated for political ideology -left, center, or right-, which is well-balanced across both topics and media. We further use a challenging experimental setup where the test examples come from media that were not seen during training, which prevents the model from learning to detect the source of the target news article instead of predicting its political ideology. From a modeling perspective, we propose an adversarial media adaptation, as well as a specially adapted triplet loss. We further add background information about the source, and we show that it is quite helpful for improving article-level prediction. Our experimental results show very sizable improvements over using state-of-the-art pre-trained Transformers in this challenging setup.
翻译:我们探索了预测主要政治意识形态或新闻文章偏见的任务。 首先,我们收集并发布了34 737篇大型数据集,其中34 737篇文章是针对政治意识形态----左翼、中翼或右翼----手工加注的,在主题和媒体上都十分平衡。 我们还使用一个具有挑战性的实验设置,测试范例来自培训期间未见的媒体,这使得模型无法学习检测目标新闻文章的来源,而不是预测其政治意识形态。从建模角度出发,我们提议对对抗媒体进行调整,以及特别调整的三重损失。我们进一步添加了源的背景信息,并表明这对改善文章水平的预测很有帮助。我们的实验结果表明,在这种具有挑战性的构件中,在使用最先进的受过训练的转型人方面,我们的实验结果显示出了非常可观的改进。