新闻标题超党派的计算评估 (Computational Assessment of Hyperpartisanship in News Titles)

We first adopt a human-guided machine learning framework to develop a new dataset for hyperpartisan news title detection with 2,200 manually labeled and 1.8 million machine-labeled titles that were posted from 2014 to the present by nine representative media organizations across three media bias groups - Left, Central, and Right in an active learning manner. The fine-tuned transformer-based language model achieves an overall accuracy of 0.84 and an F1 score of 0.78 on an external validation set. Next, we conduct a computational analysis to quantify the extent and dynamics of partisanship in news titles. While some aspects are as expected, our study reveals new or nuanced differences between the three media groups. We find that overall the Right media tends to use proportionally more hyperpartisan titles. Roughly around the 2016 Presidential Election, the proportions of hyperpartisan titles increased in all media bias groups where the relative increase in the proportion of hyperpartisan titles of the Left media was the most. We identify three major topics including foreign issues, political systems, and societal issues that are suggestive of hyperpartisanship in news titles using logistic regression models and the Shapley values. Through an analysis of the topic distribution, we find that societal issues gradually receive more attention from all media groups. We further apply a lexicon-based language analysis tool to the titles of each topic and quantify the linguistic distance between any pairs of the three media groups. Three distinct patterns are discovered. The Left media is linguistically more different from Central and Right in terms of foreign issues. The linguistic distance between the three media groups becomes smaller over recent years. In addition, a seasonal pattern where linguistic difference is associated with elections is observed for societal issues.

翻译：我们首先采用人类指导的机器学习框架,为超党派语言标题检测开发一个新的数据集,从2014年至今,由三个媒体偏差群体----左派、中央派和右派----的九个有代表性的媒体组织以积极学习的方式,从2014年至今张贴了2,200个人工标签和180万个带有机器标签的标题。微调的变压器语言模型在外部验证集中总体准确度为0.84,F1分为0.78。接下来,我们进行计算分析,以量化新闻标题中党派纷争的范围和动态。虽然有些方面如预期的那样,我们的研究揭示了三个媒体团体之间的新差异或细微差别。我们发现,总的来说,右媒体往往使用相对性更激烈的党派名称。大致围绕2016年总统选举模式,超党派名称的比例在所有媒体偏差群体中增加,左媒体超党派标题比例相对增加0.78分。我们确定了三大主题,包括外国问题、政治制度和社会问题,它们与超党派关系相关联。我们发现,三个媒体团体之间使用逻辑回归模型和Shaple值的差别。我们发现,在整个媒体排名中,每个媒体类别中逐渐采用不同主题标题,我们发现,每个话题对等话题对等话题对等话题对等议题都采用不同语言标题分析。