Media bias can significantly impact the formation and development of opinions and sentiments in a population. It is thus important to study the emergence and development of partisan media and political polarization. However, it is challenging to quantitatively infer the ideological positions of media outlets. In this paper, we present a quantitative framework to infer both political bias and content quality of media outlets from text, and we illustrate this framework with empirical experiments with real-world data. We apply a bidirectional long short-term memory (LSTM) neural network to a data set of more than 1 million tweets to generate a two-dimensional ideological-bias and content-quality measurement for each tweet. We then infer a ``media-bias chart'' of (bias, quality) coordinates for the media outlets by integrating the (bias, quality) measurements of the tweets of the media outlets. We also apply a variety of baseline machine-learning methods, such as a naive-Bayes method and a support-vector machine (SVM), to infer the bias and quality values for each tweet. All of these baseline approaches are based on a bag-of-words approach. We find that the LSTM-network approach has the best performance of the examined methods. Our results illustrate the importance of leveraging word order into machine-learning methods in text analysis.
翻译:媒体偏见可以极大地影响民众意见和情绪的形成和发展,因此,必须研究党派媒体和政治两极分化的出现和发展,然而,从数量上推断媒体的意识形态立场具有挑战性。在本文件中,我们提出了一个定量框架,从文字中推断媒体渠道的政治偏见和内容质量,我们用真实世界数据的经验实验来说明这一框架。我们将双向长期短期内存(LSTM)神经网络应用于100多万份的数据集,为每份推特提供两维意识形态偏见和内容质量衡量标准。然后我们推推出媒体渠道的“媒体-方向图表”(比例、质量)坐标,将媒体渠道的推特(比例、质量)测量方法结合起来。我们还采用各种基线机学习方法,例如天性-Bayes方法和支持-摄制机器(SVM),以推断每份推特的偏差和质量值。所有这些基线方法都基于“媒体-方向”图(比例、质量)的“媒体-方向图”协调媒体渠道(比例、质量),方法是将媒体渠道的(比例、质量)测量(质量)测量(质量)质量方法)。 我们还采用各种基线机器学习方法,将成果分析方法中的最佳方法。